tibrewalpratik17 commented on issue #11736:
URL: https://github.com/apache/pinot/issues/11736#issuecomment-1749431556

   > There are many race conditions if you take a synchronized approach to 
cleanup primary keys. Instead of introducing a new TTL, you can add a boolean 
field to apply metadataTTL (upsertTTL) to cleanup only deleted keys.
   
   Yes, so instead of applying a boolean field via some config like 
`removeDeletedKeysOnly`, I was just suggesting a new config `deletedKeysTTL`. 
The plan was to remove the keys the way upsertTTL does by iterating through the 
map and removing the keys which falls out of TTL window and leveraging 
upsertTTL workflow itself.
   
   > For both upsertKeyTTL and the new proposed deletedKeyTTL, when adding a 
new server, we still need to load all metadata again because the memory saving 
comes from skipping loading the invalid docs based on the valid doc ids 
snapshot. In order to use them in production, we need to solve this problem 
first, or no server can be replaced because very likely if we load all the keys 
(rebootstrap the metadata), the server will run out of memory.
   
   @Jackie-Jiang for metadataTTL afaik, there is a watermark which preserves 
all keys from reloaded into the memory and does not load the one outside the 
upsertTTL window. But @deemoliu can confirm on that if I misunderstood anything.
   
   As far as this `deletedKeysTTL` is concerned, I am planning to mark the 
validDocID for these keys as invalid in the segments anyways. I know there can 
be a scenario where between marking them as invalid and taking snapshot, the 
server gets restarted but then there will be only few keys which will be 
reloaded again and can be purged in the next deletion cycle.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to