Aravind-Suresh opened a new issue, #11658: URL: https://github.com/apache/pinot/issues/11658
Pinot upserts require an [in-memory map](https://github.com/apache/pinot/blob/bc07eb8e7fa1ea4ef26ebbcd1a2ce2087a3058e4/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManager.java#L54) that tracks the primary keys to the corresponding record locations. This is used to find if a record with that primary key already exists and if so, merges that with the incoming record. For tables with a large number of primary keys, this leads to huge memory consumption because this map is stored in the heap memory. In certain use-cases, we came across tables that required a longer retention period and a strict level of correctness, so we explored alternatives on replacing this in-memory map with disk-backed maps. However, the current implementation (see ConcurrentMapPartitionUpsertManager) is heavily coupled with the in-memory map (Java’s ConcurrentHashMap). This reduces the flexibility for Pinot adopters to replace this Map with their own implementation of this map. Creating this issue to discuss if we can extract an interface out of this to make the "Map" pluggable and to gather community's feedback. This [write-up](https://docs.google.com/document/d/1jsu9qChX3set560ll3ClGvorqpTP5yhsZn-RjSTE_eY/edit) talks about this idea in detail. cc @tibrewalpratik17 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
