[GitHub] [arrow] westonpace commented on pull request #33770: GH-33760: [R][C++] Handle nested field refs in scanner

via GitHub Sat, 21 Jan 2023 07:21:01 -0800


westonpace commented on PR #33770:
URL: https://github.com/apache/arrow/pull/33770#issuecomment-1399270231


   @nealrichardson Ok, I did some investigation.
   
   First, the reason this is not being encountered from pyarrow:
   
   The scanner options currently takes both a projected schema and a projection 
expression.  R sets the projection expression (and so the C++ needs to figure 
out the projected schema) and python sets the projected schema (and C++ needs 
to figure out the projection expression).  So pyarrow never encounters the code 
you are modifying (to the best of my knowledge).
   
   Second, the concern about loading the entire top-level field:
   
   It turns out that partial column loading was [never fully implemented 
anyways](https://github.com/apache/arrow/blob/apache-arrow-11.0.0/cpp/src/arrow/dataset/file_parquet.cc#L240-L247).
  So even though we go through all the trouble of figuring out exactly which 
child to load, we still just load the entire top-level field.
   
   That being said, if R is working as you expect, then I approve this approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on pull request #33770: GH-33760: [R][C++] Handle nested field refs in scanner

Reply via email to