Re: [PR] GH-48636: [C++][Parquet] Improve parquet reading using multi threads [arrow]

via GitHub Fri, 12 Jun 2026 06:28:30 -0700


hombit commented on PR #50158:
URL: https://github.com/apache/arrow/pull/50158#issuecomment-4691699097


   @wgtmac could you please explain your argument a little bit? I'm not really 
getting it. From my understanding, what this PR is doing is making the read of 
a Parquet file with one or a few wide nested columns be parallelized in the 
same way as reading a Parquet file with exactly the same data but with 
"flattened" columns.  
   @OmBiradar is working on this from the perspective of better performance of 
local reads of astronomical data. See dataset examples 
[here](https://huggingface.co/datasets/UniverseTBD/mmu_sdss_sdss) and 
[here](https://data.lsdb.io/ZTF/ZTF_DR23_(lightcurves)).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-48636: [C++][Parquet] Improve parquet reading using multi threads [arrow]

Reply via email to