houqp edited a comment on pull request #1392:
URL: https://github.com/apache/arrow-datafusion/pull/1392#issuecomment-989633614


   Sorry for the late reply, @andrei-ionescu the problem you are getting is 
basically caused by the problem I mentioned in 
https://github.com/apache/arrow-datafusion/pull/1392#issuecomment-985333246. 
Fundamentally, it's due to differences between how nested struct fields are 
handled in Arrow and Parquet.
   
   @lst-codes managing stats in a nested data structure could fix the problem. 
However, being inspired by https://github.com/apache/arrow/pull/11704, I think 
it would be more efficient to resolve the nested column key path during 
planning by traversing the `Expr::GetIndexedField` expression , then only load 
corresponding parquet column stats into memory. This way, we can skip columns 
that are not accessed by the query.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to