openinx commented on issue #2504:
URL: https://github.com/apache/iceberg/issues/2504#issuecomment-824582275


   @ayush-san ,  I think that's because we've maintained all the keys that come 
from the same checkpoint in a __in-memory__  HashMap, it mainly used to locate 
the `<file_id, pos>` for the rows that was written in the current checkpoint 
before.  In the long run, we need to change this HashMap to a Map that can 
spill to disk or replace it with an embedded KV lib, so that we can take on a 
larger number of rows in a single checkpoint.  
[This](https://docs.google.com/presentation/d/18xL5hhGfJKEVJyv-fbfoLYWgioRMqoEutpKFDjXhyKA/edit#slide=id.gb479a3dd40_0_948)
 would be a good document to describe the current design.
   
   FYI @rdblue .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to