[GitHub] [arrow] jorisvandenbossche commented on pull request #33770: GH-33760: [R][C++] Handle nested field refs in scanner

via GitHub Wed, 25 Jan 2023 00:45:52 -0800


jorisvandenbossche commented on PR #33770:
URL: https://github.com/apache/arrow/pull/33770#issuecomment-1403268966


   > It turns out that partial column loading was [never fully implemented 
anyways](https://github.com/apache/arrow/blob/apache-arrow-11.0.0/cpp/src/arrow/dataset/file_parquet.cc#L240-L247).
 So even though we go through all the trouble of figuring out exactly which 
child to load, we still just load the entire top-level field.
   
   Yes, the open issue about this is 
https://github.com/apache/arrow/issues/33167 (the Parquet reader itself 
supports this, and so because we switched pyarrow.parquet.read_table from the 
direct Parquet reader to dataset based reader, that's actually a perf 
regression for people that were using this)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on pull request #33770: GH-33760: [R][C++] Handle nested field refs in scanner

Reply via email to