[I] Expose public async bloom filter reader (metadata + `AsyncFileReader`) [arrow-rs]

via GitHub Mon, 29 Dec 2025 20:03:40 -0800


ethe opened a new issue, #9067:
URL: https://github.com/apache/arrow-rs/issues/9067


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   I'm integrating Parquet bloom filters into an async pruning pipeline and 
found a gap in the public API.
   
   Current situation
   - There is a sync API:
     `Sbbf::read_from_column_chunk(column_meta, reader)`
   - There is an async method, but only on the async Arrow builder:
     `ParquetRecordBatchStreamBuilder::get_row_group_column_bloom_filter(...)`
   - The helper used internally to parse bloom filter headers is `pub(crate)`:
     `chunk_read_bloom_filter_header_and_offset (in parquet::bloom_filter)`
   
   If parquet crate only has `ParquetMetaData` + an `AsyncFileReader`, 
downstream applications can't read bloom filters without:
   
   1. using the builder (which requires &mut and ties you to Arrow's reader), or
   2. re‑implementing Parquet bloom header parsing.
   
   This blocks async metadata‑only pruning libraries (like ours) from using 
bloom filters safely and efficiently.
   
   **Describe the solution you'd like**
   
   Expose a public async bloom reader that mirrors the sync API:
   
   ```rust
   pub async fn read_bloom_filter_async<R: AsyncFileReader>(
       column_meta: &ColumnChunkMetaData,
       reader: &mut R
   ) -> Result<Option<Sbbf>>;
   ```
   
   This would:
   
   - keep internal header parsing private
   - allow async pruning without coupling to Arrow builder
   - avoid duplicate parsing logic in downstream crates
   - be backwards compatible (pure API addition)
   
   *Alternative*
   Make `chunk_read_bloom_filter_header_and_offset` public, but this is a 
low‑level parsing helper and would bake in more implementation detail.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Expose public async bloom filter reader (metadata + `AsyncFileReader`) [arrow-rs]

Reply via email to