westonpace commented on PR #33770: URL: https://github.com/apache/arrow/pull/33770#issuecomment-1399270231
@nealrichardson Ok, I did some investigation. First, the reason this is not being encountered from pyarrow: The scanner options currently takes both a projected schema and a projection expression. R sets the projection expression (and so the C++ needs to figure out the projected schema) and python sets the projected schema (and C++ needs to figure out the projection expression). So pyarrow never encounters the code you are modifying (to the best of my knowledge). Second, the concern about loading the entire top-level field: It turns out that partial column loading was [never fully implemented anyways](https://github.com/apache/arrow/blob/apache-arrow-11.0.0/cpp/src/arrow/dataset/file_parquet.cc#L240-L247). So even though we go through all the trouble of figuring out exactly which child to load, we still just load the entire top-level field. That being said, if R is working as you expect, then I approve this approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
