[GitHub] [iceberg] moon-fall edited a comment on issue #3909: When we use spark action rewriteDataFiles, how to limit equality_delete file compations memory.

GitBox Wed, 19 Jan 2022 18:48:48 -0800


moon-fall edited a comment on issue #3909:
URL: https://github.com/apache/iceberg/issues/3909#issuecomment-1017062429



   A large amount of memory is used because each datafile will read all the 
data of the deletefile which seqNum is bigger than datafile in a hashSet for 
filtering, But only those keys that are also in the datafile are necessary to 
read , In the optimized version of my company , I use a Datafile bloom filter 
to filter out unnecessary eq-deletefile keys (datafile hashset also works, but 
using hashset usually consumes more memory) , if the datafile support storage 
Bloom filter Such as  using parquet format based on #2642 ，It's easier to read 
bloom filters directly from a datafile.
    maybe I can pull a request if need.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] moon-fall edited a comment on issue #3909: When we use spark action rewriteDataFiles, how to limit equality_delete file compations memory.

Reply via email to