alamb commented on PR #7360:
URL: https://github.com/apache/arrow-rs/pull/7360#issuecomment-2766609690

   > > I think it is possible to implement this feature without modifing the 
parquet reader and using the currently available APIs
   > 
   > I have tried to implement this in third-party libs, but arrow-rs lacks 
enough public APIs (for example, users can not construct `Sbbf` outside of 
`parquet`), also the related APIs is not convenient enough to be used in public 
at the moment.
   
   You can certainly access and use Sbbf outside the parquet crate, for example 
Datafusion does to to prune out row groups and data pages here:
   
   
https://github.com/apache/datafusion/blob/6d5e00ad3f8e53f7252cb1d3c72a6c7f28c1aed6/datafusion/datasource-parquet/src/row_group_filter.rs#L236-L235
   
   What is the use case for constructing `Sbbf`? I think it would be find to 
make that public in the crate
   
   > 
   > > That being said, as you show here it is non trivial to implement row 
group / page filtering.
   > 
   > That's what I want to point out, this demand is general enough to lots of 
users, but it is not that easy to be realized, and also exposes lots of 
internal details,
   
   Yes indeed it is not trivial to implement a fast parquet reader integrated 
with a query engine
   
   
   > if parquet contains a first-party `TableProvider` implementation, it is 
good to me.
   
   
   
   What do you mean by "TableProvider" ? If you are using DataFusion already, 
perhaps you can use the built in parquet reader (`ListingTableProvider`) that 
already has all these optimizations
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to