alamb commented on PR #5860: URL: https://github.com/apache/arrow-rs/pull/5860#issuecomment-2159190734
Thank you @progval cc @Ted-Jiang and @jimexist I think there is a tradeoff: * Writing all the bloom filters at the end requires them to be buffered (which you point out) * Writing all the bloom filters at the end means they are contiguous and thus the reader can fetch multiple bloom filters in a single IO (which is important if reading from something like `S3`) Thus given there is a tradeoff it seems like we should at least offer an config setting of where to write the bloom filters. I don't know if the parquet bloom filter spec dictates where the bloom filters should be written or if the ecosystem (aka paruqet-java) implicity requires them in a particular location -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
