Jimexist commented on issue #3138:
URL: https://github.com/apache/arrow-rs/issues/3138#issuecomment-1323795187

   @alamb i believe we should start simple, to support only 2 params:
   
   1. whether bloom filter is enabled as a master switch
   2. a range of fpp, with which we'd assume all unique items, and use that row 
count per row group to calculate a bitset size, but cap that to 128MiB; for 
very large fpp e.g. 1.0 or 0.9999 the minimal is 32.
   
   controlling disk size does not quite make sense or is counter intuitive 
because users then need to both estimate unique number of items per row group 
as well as know how to derive fpp from that - in most cases, having a maxinum 
fpp is good enough
   
   cc @tustvold 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to