[GitHub] [iceberg] fengsen-neu commented on issue #3909: When we use spark action rewriteDataFiles, how to limit equality_delete file compations memory.

GitBox Tue, 25 Jan 2022 08:53:48 -0800


fengsen-neu commented on issue #3909:
URL: https://github.com/apache/iceberg/issues/3909#issuecomment-1020782807



   > A large amount of memory is used because each datafile will read all the 
data of the deletefile which seqNum is bigger than datafile in a hashSet for 
filtering, But only those keys that are also in the datafile are necessary to 
read , In the optimized version of my company , I use a bloom filter of 
datafile's keys to filter out unnecessary eq-deletefile keys (hashset of 
datafile's keys also works, but using hashset usually consumes more memory) , 
if the datafile support storage Bloom filter Such as using parquet format based 
on #2642 ，It's easier to read bloom filters directly from a datafile. maybe I 
can pull a request if need.
   
   It's easier to read bloom filters directly from a datafile. maybe I can pull 
a request if need. --- Could you please provide the PR for my reference?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] fengsen-neu commented on issue #3909: When we use spark action rewriteDataFiles, how to limit equality_delete file compations memory.

Reply via email to