jorisvandenbossche opened a new issue, #38432: URL: https://github.com/apache/arrow/issues/38432
Looking at some of the Parquet read benchmarks we have (https://conbench.ursa.dev/c-benchmarks/file-read), there are two small regressions to be observed the last months. For example, zooming in on that time for nyctaxi_2010 with snappy compression (https://conbench.ursa.dev/benchmark-results/0653775157e07b6980005bce6e59a2ad/):  And for fanniemae (https://conbench.ursa.dev/benchmark-results/0653774cc429733d8000b2db91fa63fa/); here only the first bump is noticeable:  There are two small bumps, which I think are caused by the following two PRs, in order of the bumps: - https://github.com/apache/arrow/pull/14341 This PR added a new encoding option, but so my expectation is that it shouldn't have to slow down the default (the new encoding isn't used)? (cc @rok) - https://github.com/apache/arrow/pull/37854 While this change is mostly useful for cloud filesystems, and could be detrimental for fast local disk, the general assumption was that this effect for local disk would not be significant, so we could simply enable it for all filesystems. But based on this benchmark it seems the impact might actually not be fully insignificant? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
