jorisvandenbossche commented on PR #33770: URL: https://github.com/apache/arrow/pull/33770#issuecomment-1403268966
> It turns out that partial column loading was [never fully implemented anyways](https://github.com/apache/arrow/blob/apache-arrow-11.0.0/cpp/src/arrow/dataset/file_parquet.cc#L240-L247). So even though we go through all the trouble of figuring out exactly which child to load, we still just load the entire top-level field. Yes, the open issue about this is https://github.com/apache/arrow/issues/33167 (the Parquet reader itself supports this, and so because we switched pyarrow.parquet.read_table from the direct Parquet reader to dataset based reader, that's actually a perf regression for people that were using this) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
