[I] [C++] Parquet reading performance regressions [arrow]

via GitHub Tue, 24 Oct 2023 04:06:22 -0700


jorisvandenbossche opened a new issue, #38432:
URL: https://github.com/apache/arrow/issues/38432


   Looking at some of the Parquet read benchmarks we have 
(https://conbench.ursa.dev/c-benchmarks/file-read), there are two small 
regressions to be observed the last months.
   
   For example, zooming in on that time for nyctaxi_2010 with snappy 
compression 
(https://conbench.ursa.dev/benchmark-results/0653775157e07b6980005bce6e59a2ad/):
   
   
![image](https://github.com/apache/arrow/assets/1020496/c95ff04b-6a53-4d09-916d-d55b13e50a9d)
   
   And for fanniemae 
(https://conbench.ursa.dev/benchmark-results/0653774cc429733d8000b2db91fa63fa/);
 here only the first bump is noticeable:
   
   
![image](https://github.com/apache/arrow/assets/1020496/3267c309-155f-4691-a3de-8f845f990510)
   
   
   There are two small bumps, which I think are caused by the following two 
PRs, in order of the bumps:
   
   - https://github.com/apache/arrow/pull/14341 
     This PR added a new encoding option, but so my expectation is that it 
shouldn't have to slow down the default (the new encoding isn't used)? (cc @rok)
   - https://github.com/apache/arrow/pull/37854
     While this change is mostly useful for cloud filesystems, and could be 
detrimental for fast local disk, the general assumption was that this effect 
for local disk would not be significant, so we could simply enable it for all 
filesystems. But based on this benchmark it seems the impact might actually not 
be fully insignificant?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [C++] Parquet reading performance regressions [arrow]

Reply via email to