corwinjoy commented on issue #39676: URL: https://github.com/apache/arrow/issues/39676#issuecomment-1901421261
Points from the profiling session: 1. This supports my claim that the metadata read is extremely expensive (up to 40x the read time with statistics). 2. Removing statistics helps, but there are still some left after turning them off. Overall, I believe the problem is just the large number of rowgroups and columns that need to be read for the full metadata. 3. This is why I believe it makes sense to create a method that can avoid this full metadata read. Reading only the first rowgroup as a kind of prototype is one way, there may be others. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
