jorisvandenbossche commented on PR #38784: URL: https://github.com/apache/arrow/pull/38784#issuecomment-1825309706
Neither of them shows an improvement .. You can find the plot I am looking at by going to "full Conbench report" -> "Pull Request Run on ursa-i9-9960x at [2023-11-23 15:46:24Z](https://conbench.ursa.dev/runs/34aee21814944f278f429aaad8bbe948)" (in section "All benchmark runs analyzed:" on that page) -> "compare to baseline run from fork point commit (recommended)" -> sort on benchmark name -> find "file-read" for "compression=snappy, dataset=fanniemae_2016Q4, file_type=parquet, language=R, output_type=table" (typically around page 7) (or if there is actually an improvement, you can also sort by "z-score" with positive values first) For the three runs, I get those three pages: * https://conbench.ursa.dev/compare/benchmark-results/0655f5af97857ba780005944ff57195f...0655f85d79c77f8e8000bec63a9a83ff/ * https://conbench.ursa.dev/compare/benchmark-results/0655f5adc28975398000ef0b9ea87352...0655fb3943c77f648000b1280c84716e/ * https://conbench.ursa.dev/compare/benchmark-results/0655f5adc28975398000ef0b9ea87352...065603346d0f7c1b800042fdd468a614/ I am also not fully sure the R version seems to show a much bigger slowdown than the Python version (https://conbench.ursa.dev/compare/benchmark-results/0655f5af97857ba780005944ff57195f...0655f85d79c77f8e8000bec63a9a83ff/ vs https://conbench.ursa.dev/compare/benchmark-results/0655f5263a32767a80008e5f53bf5830...0655f7d37080706a80000b9cc3264c89/), because they are both reading the same file with same compression, both into a table (so no conversion to Python pandas DataFrame or R data.frame). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
