Jackie-Jiang opened a new issue, #11045: URL: https://github.com/apache/pinot/issues/11045
Here are some of the main differences between dedup and upsert: - Dedup is done when ingesting data from the stream (apply to consuming segment only), and no need to track valid docs. The duplicate records are simply dropped - Dedup window (TTL of the metadata) is a must have to reduce the metadata size - There is no need to track the record location in the dedup metadata. We do want to track timestamp for the dedup window One potential solution for the dedup window is to keep 2 rotating maps, each storing metadata for one dedup window, and once the old map is completely out of the dedup window, clear it and use it as the new map. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
