tibrewalpratik17 commented on issue #11736: URL: https://github.com/apache/pinot/issues/11736#issuecomment-1749431556
> There are many race conditions if you take a synchronized approach to cleanup primary keys. Instead of introducing a new TTL, you can add a boolean field to apply metadataTTL (upsertTTL) to cleanup only deleted keys. Yes, so instead of applying a boolean field via some config like `removeDeletedKeysOnly`, I was just suggesting a new config `deletedKeysTTL`. The plan was to remove the keys the way upsertTTL does by iterating through the map and removing the keys which falls out of TTL window and leveraging upsertTTL workflow itself. > For both upsertKeyTTL and the new proposed deletedKeyTTL, when adding a new server, we still need to load all metadata again because the memory saving comes from skipping loading the invalid docs based on the valid doc ids snapshot. In order to use them in production, we need to solve this problem first, or no server can be replaced because very likely if we load all the keys (rebootstrap the metadata), the server will run out of memory. @Jackie-Jiang for metadataTTL afaik, there is a watermark which preserves all keys from reloaded into the memory and does not load the one outside the upsertTTL window. But @deemoliu can confirm on that if I misunderstood anything. As far as this `deletedKeysTTL` is concerned, I am planning to mark the validDocID for these keys as invalid in the segments anyways. I know there can be a scenario where between marking them as invalid and taking snapshot, the server gets restarted but then there will be only few keys which will be reloaded again and can be purged in the next deletion cycle. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
