alamb commented on PR #7360:
URL: https://github.com/apache/arrow-rs/pull/7360#issuecomment-2766763040

   > > What is the use case for constructing Sbbf?
   > 
   > I can not find the way to get `Sbbf` instances in the async read path of 
`parquet` crates, this only works with `SerializedRowGroupReader`, but it is 
synchronous, so I have to construct it manually from `bytes::Bytes`.
   
   Perhaps you can propose an API to do so (perhaps on ParquetMetadataReader)
   
   > I do not use datafusion (not yet), if there is a first-party scan method 
of parquet async reader with prediction/projection/limitation pushdown, that is 
what I need. I'd like to say `TableProvider` provides similar semantics to the 
above API, but I'm not sure it is the best choice to be the first-party 
implementation in `parquet`.
   
   I agree implementing a table provider like interface in the parquet crate is 
likely not a good idea
   
   > > My biggest concern here is adding more code to maintain as part of this 
crate that may not be widely used
   > 
   > Chroma(@HammadB) and also Tonbo both run into this issue.
   
   If the issue is that the public API of the parquet-rs crate doesn't allow 
you to implement pushdowns I agree we should extend the API to address whatever 
you are having trouble doing
   
   If the issue is that it is complex to implement parquet predicate pushdown, 
I am not sure that is a great fit for this crate because the details of 
implementing predicate pushdown vary significantly from system to system. For 
example
   1. What predicates are supported ( do you support predicates like prefix 
matching, user defined functions, etc).  
   2. How do you evaluate predicates when there are multiple files (with 
potentially different but compatible schemas)
   3. How do you evaluate predicates using information from an external 
metadata catalog (e.g. iceberg or similar)
   4. How do you interleave fetching metadata, evaluating predicates, and 
scanning files
   
   It isn't clear to me where to draw the line between predicate evaluation and 
a full query engine.
   
   Maybe you and @HammadB  can create some other crate 
(parquet-predicate-pushdown) implementing the specific pushdown APIs that you 
need.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to