moon-fall commented on issue #3909:
URL: https://github.com/apache/iceberg/issues/3909#issuecomment-1017062429
A large amount of memory is used because each datafile will read all the
data of the deletefile which seqNum is bigger than datafile in a hashSet for
filtering, In the optimized version of my company , I use a Datafile bloom
filter to filter out unnecessary eq-deletefile keys (datafile hashset also
works, but using hashset usually consumes more memory) , if the datafile
support storage Bloom filter Such as using parquet format based on #2642 ,It's
easier to read bloom filters directly from a datafile.
maybe I can pull a request if need.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]