Aravind-Suresh opened a new issue, #11658:
URL: https://github.com/apache/pinot/issues/11658

   Pinot upserts require an [in-memory 
map](https://github.com/apache/pinot/blob/bc07eb8e7fa1ea4ef26ebbcd1a2ce2087a3058e4/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManager.java#L54)
 that tracks the primary keys to the corresponding record locations. This is 
used to find if a record with that primary key already exists and if so, merges 
that with the incoming record. For tables with a large number of primary keys, 
this leads to huge memory consumption because this map is stored in the heap 
memory.
   
   In certain use-cases, we came across tables that required a longer retention 
period and a strict level of correctness, so we explored alternatives on 
replacing this in-memory map with disk-backed maps. However, the current 
implementation (see ConcurrentMapPartitionUpsertManager) is heavily coupled 
with the in-memory map (Java’s ConcurrentHashMap). This reduces the flexibility 
for Pinot adopters to replace this Map with their own implementation of this 
map.
   
   Creating this issue to discuss if we can extract an interface out of this to 
make the "Map" pluggable and to gather community's feedback.
   
   This 
[write-up](https://docs.google.com/document/d/1jsu9qChX3set560ll3ClGvorqpTP5yhsZn-RjSTE_eY/edit)
 talks about this idea in detail.
   
   cc @tibrewalpratik17


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to