tibrewalpratik17 commented on code in PR #13285:
URL: https://github.com/apache/pinot/pull/13285#discussion_r1631704089
##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java:
##########
@@ -703,9 +703,20 @@ public void run() {
// persisted.
// Take upsert snapshot before starting consuming events
if (_partitionUpsertMetadataManager != null) {
- _partitionUpsertMetadataManager.takeSnapshot();
- // If upsertTTL is enabled, we will remove expired primary keys from
upsertMetadata after taking snapshot.
- _partitionUpsertMetadataManager.removeExpiredPrimaryKeys();
+ if (_tableConfig.getUpsertMetadataTTL() > 0) {
+ // If upsertMetadataTTL is enabled, we will remove expired primary
keys from upsertMetadata
+ // AFTER taking a snapshot. Taking the snapshot first is crucial
to ensure we capture the final
+ // state of a particular key before it exits the TTL window.
Review Comment:
> I don't see this as a very strict requirement
I mention this because snapshotting isn't always reliable. For instance,
we've encountered Path not found exceptions during the snapshotting of some
segments in the past. However, _metadataTTL would still proceed regardless. So
IIUC the ordering might be more of an optimization to ensure better data
correctness.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]