Re: [PR] GH-38432: [C++][Parquet] Trying to Fix regression in the DictByteArrayDecoderImpl [arrow]

via GitHub Fri, 24 Nov 2023 00:31:42 -0800


jorisvandenbossche commented on PR #38784:
URL: https://github.com/apache/arrow/pull/38784#issuecomment-1825309706


   Neither of them shows an improvement .. 
   You can find the plot I am looking at by going to "full Conbench report" -> 
"Pull Request Run on ursa-i9-9960x at [2023-11-23 
15:46:24Z](https://conbench.ursa.dev/runs/34aee21814944f278f429aaad8bbe948)" 
(in section "All benchmark runs analyzed:" on that page) -> "compare to 
baseline run from fork point commit (recommended)" -> sort on benchmark name -> 
find "file-read" for "compression=snappy, dataset=fanniemae_2016Q4, 
file_type=parquet, language=R, output_type=table" (typically around page 7)
   
   (or if there is actually an improvement, you can also sort by "z-score" with 
positive values first)
   
   For the three runs, I get those three pages:
   
   * 
https://conbench.ursa.dev/compare/benchmark-results/0655f5af97857ba780005944ff57195f...0655f85d79c77f8e8000bec63a9a83ff/
   * 
https://conbench.ursa.dev/compare/benchmark-results/0655f5adc28975398000ef0b9ea87352...0655fb3943c77f648000b1280c84716e/
   * 
https://conbench.ursa.dev/compare/benchmark-results/0655f5adc28975398000ef0b9ea87352...065603346d0f7c1b800042fdd468a614/
   
   I am also not fully sure the R version seems to show a much bigger slowdown 
than the Python version 
(https://conbench.ursa.dev/compare/benchmark-results/0655f5af97857ba780005944ff57195f...0655f85d79c77f8e8000bec63a9a83ff/
 vs 
https://conbench.ursa.dev/compare/benchmark-results/0655f5263a32767a80008e5f53bf5830...0655f7d37080706a80000b9cc3264c89/),
 because they are both reading the same file with same compression, both into a 
table (so no conversion to Python pandas DataFrame or R data.frame).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-38432: [C++][Parquet] Trying to Fix regression in the DictByteArrayDecoderImpl [arrow]

Reply via email to