Re: [I] [C++][Parquet] Fast Random Rowgroup Reads [arrow]

via GitHub Fri, 19 Jan 2024 16:27:27 -0800


corwinjoy commented on issue #39676:
URL: https://github.com/apache/arrow/issues/39676#issuecomment-1901421261


   Points from the profiling session:
   1. This supports my claim that the metadata read is extremely expensive (up 
to 40x the read time with statistics).
   2. Removing statistics helps, but there are still some left after turning 
them off. Overall, I believe the problem is just the large number of rowgroups 
and columns that need to be read for the full metadata.
   3. This is why I believe it makes sense to create a method that can avoid 
this full metadata read. Reading only the first rowgroup as a kind of prototype 
is one way, there may be others.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [C++][Parquet] Fast Random Rowgroup Reads [arrow]

Reply via email to