klsince commented on code in PR #13285:
URL: https://github.com/apache/pinot/pull/13285#discussion_r1634137440
##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java:
##########
@@ -703,9 +703,20 @@ public void run() {
// persisted.
// Take upsert snapshot before starting consuming events
if (_partitionUpsertMetadataManager != null) {
- _partitionUpsertMetadataManager.takeSnapshot();
- // If upsertTTL is enabled, we will remove expired primary keys from
upsertMetadata after taking snapshot.
- _partitionUpsertMetadataManager.removeExpiredPrimaryKeys();
+ if (_tableConfig.getUpsertMetadataTTL() > 0) {
+ // If upsertMetadataTTL is enabled, we will remove expired primary
keys from upsertMetadata
+ // AFTER taking a snapshot. Taking the snapshot first is crucial
to ensure we capture the final
+ // state of a particular key before it exits the TTL window.
Review Comment:
I see, that makes sense. Based on the metadata TTL related code, it seems
like taking snapshot after removing metadata out of the TTL wouldn’t affect
data/query correctness but incurred some extra overhead if server restarted
before taking new snapshot as it had to add metadata (already out of TTL) back
to Map, however those metadata would be removed again.
IIUC, maybe update the comment a bit that this is mainly to avoid such
overhead upon unexpected server restart, but not affecting correctness.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]