jadami10 commented on PR #17254:
URL: https://github.com/apache/pinot/pull/17254#issuecomment-3577480262

   > Will this issue happen to Dedup as well?
   
   My understanding from dedup is that 
[here](https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/dedup/ConcurrentMapPartitionDedupMetadataManager.java#L125-L127),
 we already treat metadata out of TTL as non-existent. So it doesn't matter 
when the cleanup happens; an expired primary key is getting ignored either way. 
In existing OSS code, this works correctly because everything is backed by a 
concurrent hashmap, so all of the operations are thread safe. Since dedup is 
per partition, it's also assumed that all events for a single partition are 
ordered, so side effects like restarting a server will lead to the same events 
ingested in the same order. Therefore, we should always be consistently 
applying deduplication regardless of if the cleanup is done when the new 
consuming segment is created or on some other schedule.
   
   I haven't compared this to how upsert works since they seemed like separate 
implementations. Are they sharing some code/behavior I'm missing?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to