[GitHub] [arrow-datafusion] houqp edited a comment on pull request #1392: Fix index out of bounds for stats on nested fields

GitBox Thu, 09 Dec 2021 12:58:35 -0800


houqp edited a comment on pull request #1392:
URL: https://github.com/apache/arrow-datafusion/pull/1392#issuecomment-989633614



   Sorry for the late reply, @andrei-ionescu the problem you are getting is 
basically caused by the problem I mentioned in 
https://github.com/apache/arrow-datafusion/pull/1392#issuecomment-985333246. 
Fundamentally, it's due to differences between how nested struct fields are 
handled in Arrow and Parquet.
   
   @lst-codes managing stats in a nested data structure could fix the problem. 
However, being inspired by https://github.com/apache/arrow/pull/11704, I think 
it would be more efficient to resolve the nested column key path during 
planning by traversing the `Expr::GetIndexedField` expression , then only load 
corresponding parquet column stats into memory. This way, we can skip columns 
that are not accessed by the query.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] houqp edited a comment on pull request #1392: Fix index out of bounds for stats on nested fields

Reply via email to