alamb commented on PR #6034: URL: https://github.com/apache/arrow-datafusion/pull/6034#issuecomment-1520193728
I ran some preliminary benchmarks against this branch and it seems like some queries have gotten slightly slower: ``` alamb@aal-dev:~/benchmarking/feature%2Fstream_groupby4$ python3 ~/arrow-datafusion/benchmarks/compare.py tpch_sf1_parquet_mem.json tpch_sf1_mem_branch.json ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓ ┃ Query ┃ -o ┃ -o ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩ │ QQuery 1 │ 770.75ms │ 760.05ms │ no change │ │ QQuery 2 │ 289.80ms │ 312.71ms │ 1.08x slower │ │ QQuery 3 │ 174.31ms │ 175.09ms │ no change │ │ QQuery 4 │ 106.65ms │ 104.51ms │ no change │ │ QQuery 5 │ 477.41ms │ 480.83ms │ no change │ │ QQuery 6 │ 38.15ms │ 37.78ms │ no change │ │ QQuery 7 │ 1071.70ms │ 1082.32ms │ no change │ │ QQuery 8 │ 252.64ms │ 264.53ms │ no change │ │ QQuery 9 │ 581.89ms │ 598.15ms │ no change │ │ QQuery 10 │ 332.62ms │ 339.50ms │ no change │ │ QQuery 11 │ 282.02ms │ 291.65ms │ no change │ │ QQuery 12 │ 145.87ms │ 152.48ms │ no change │ │ QQuery 13 │ 679.94ms │ 680.18ms │ no change │ │ QQuery 14 │ 59.35ms │ 58.90ms │ no change │ │ QQuery 15 │ 96.58ms │ 96.56ms │ no change │ │ QQuery 16 │ 251.37ms │ 266.31ms │ 1.06x slower │ │ QQuery 17 │ 2435.04ms │ 2539.73ms │ no change │ │ QQuery 18 │ 3021.24ms │ 3272.84ms │ 1.08x slower │ │ QQuery 19 │ 142.99ms │ 153.61ms │ 1.07x slower │ │ QQuery 20 │ 925.24ms │ 1058.29ms │ 1.14x slower │ │ QQuery 21 │ 1423.51ms │ 1407.18ms │ no change │ │ QQuery 22 │ 148.12ms │ 144.86ms │ no change │ └──────────────┴──────────────┴──────────────┴──────────────┘ alamb@aal-dev:~/benchmarking/feature%2Fstream_groupby4$ python3 ~/arrow-datafusion/benchmarks/compare.py tpch_sf1_parquet_main.json tpch_sf1_parquet_branch.json ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓ ┃ Query ┃ /home/alamb… ┃ /home/alamb… ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩ │ QQuery 1 │ 1470.88ms │ 1456.73ms │ no change │ │ QQuery 2 │ 394.00ms │ 422.56ms │ 1.07x slower │ │ QQuery 3 │ 564.83ms │ 540.62ms │ no change │ │ QQuery 4 │ 222.21ms │ 221.49ms │ no change │ │ QQuery 5 │ 717.36ms │ 702.32ms │ no change │ │ QQuery 6 │ 460.41ms │ 454.66ms │ no change │ │ QQuery 7 │ 1216.67ms │ 1230.08ms │ no change │ │ QQuery 8 │ 717.35ms │ 731.94ms │ no change │ │ QQuery 9 │ 1337.85ms │ 1326.94ms │ no change │ │ QQuery 10 │ 765.05ms │ 787.99ms │ no change │ │ QQuery 11 │ 337.95ms │ 344.80ms │ no change │ │ QQuery 12 │ 329.03ms │ 327.28ms │ no change │ │ QQuery 13 │ 1105.33ms │ 1170.87ms │ 1.06x slower │ │ QQuery 14 │ 449.61ms │ 450.68ms │ no change │ │ QQuery 15 │ 405.36ms │ 417.38ms │ no change │ │ QQuery 16 │ 330.18ms │ 349.17ms │ 1.06x slower │ │ QQuery 17 │ 2772.98ms │ 2891.72ms │ no change │ │ QQuery 18 │ 3592.01ms │ 3802.18ms │ 1.06x slower │ │ QQuery 19 │ 769.32ms │ 771.99ms │ no change │ │ QQuery 20 │ 1237.75ms │ 1326.82ms │ 1.07x slower │ │ QQuery 21 │ 1663.89ms │ 1633.89ms │ no change │ │ QQuery 22 │ 197.52ms │ 202.74ms │ no change │ └──────────────┴──────────────┴──────────────┴──────────────┘ ``` Script I used is here: https://github.com/alamb/datafusion-benchmarking/blob/628151e3e3d27ff6e5242052d017f71dcd0d80ef/bench.sh I am rerunning the numbers to see if I can reproduce the results -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org