tibrewalpratik17 commented on code in PR #13285:
URL: https://github.com/apache/pinot/pull/13285#discussion_r1631700908
##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java:
##########
@@ -703,9 +703,20 @@ public void run() {
// persisted.
// Take upsert snapshot before starting consuming events
if (_partitionUpsertMetadataManager != null) {
- _partitionUpsertMetadataManager.takeSnapshot();
- // If upsertTTL is enabled, we will remove expired primary keys from
upsertMetadata after taking snapshot.
- _partitionUpsertMetadataManager.removeExpiredPrimaryKeys();
+ if (_tableConfig.getUpsertMetadataTTL() > 0) {
+ // If upsertMetadataTTL is enabled, we will remove expired primary
keys from upsertMetadata
+ // AFTER taking a snapshot. Taking the snapshot first is crucial
to ensure we capture the final
+ // state of a particular key before it exits the TTL window.
Review Comment:
> curious why it is critical to take snapshot first? The states of those
particular keys will be gone in the next snapshot, and would that cause issue?
I don't see this as a very strict requirement, but let's consider the
scenario where we reverse the order and the _metadataTTL logic removes keys
older than the TTL window. If a server restart occurs before snapshotting, the
addSegment flow will not find a snapshot for an out-of-TTL segment and will
consider all records as valid for querying. However, some records may have had
updates and should not be served from this out-of-TTL segment. The snapshot
logic helps prevent this issue. This is my understanding from the code, but
@deemoliu can confirm.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]