alamb commented on issue #5637: URL: https://github.com/apache/arrow-datafusion/issues/5637#issuecomment-2050308294
An update here. Thanks to a bunch of work by @haohuaijin @matthewmturner @jayzhan211 @peter-toth @jackwener and myself, the planning speed on 38.0.0 is looking to be quite a bit better 20%-700% better in many cases. I am fairly confident there is still another factor of 2 to be had by completing https://github.com/apache/arrow-datafusion/issues/9637, which I expect to complete over the next few weeks I think there is ``` + critcmp main 37.0.0 group 37.0.0 main ----- ------ ---- logical_aggregate_with_join 1.05 1271.9±10.23µs ? ?/sec 1.00 1210.1±16.14µs ? ?/sec logical_plan_tpcds_all 1.07 167.2±1.37ms ? ?/sec 1.00 156.4±0.88ms ? ?/sec logical_plan_tpch_all 1.01 17.2±0.18ms ? ?/sec 1.00 17.0±0.15ms ? ?/sec logical_select_all_from_1000 4.84 93.5±0.41ms ? ?/sec 1.00 19.3±0.10ms ? ?/sec logical_select_one_from_700 1.00 751.6±12.41µs ? ?/sec 1.06 795.9±8.14µs ? ?/sec logical_trivial_join_high_numbered_columns 1.06 795.8±10.91µs ? ?/sec 1.00 750.1±8.38µs ? ?/sec logical_trivial_join_low_numbered_columns 1.04 764.2±18.21µs ? ?/sec 1.00 737.4±18.35µs ? ?/sec physical_plan_tpcds_all 1.46 2.2±0.01s ? ?/sec 1.00 1479.1±3.64ms ? ?/sec physical_plan_tpch_all 1.35 134.5±0.81ms ? ?/sec 1.00 99.6±0.77ms ? ?/sec physical_plan_tpch_q1 1.43 7.7±0.06ms ? ?/sec 1.00 5.4±0.07ms ? ?/sec physical_plan_tpch_q10 1.38 6.4±0.05ms ? ?/sec 1.00 4.6±0.02ms ? ?/sec physical_plan_tpch_q11 1.24 5.1±0.03ms ? ?/sec 1.00 4.1±0.03ms ? ?/sec physical_plan_tpch_q12 1.25 4.1±0.02ms ? ?/sec 1.00 3.3±0.01ms ? ?/sec physical_plan_tpch_q13 1.22 2.7±0.02ms ? ?/sec 1.00 2.2±0.01ms ? ?/sec physical_plan_tpch_q14 1.22 3.5±0.02ms ? ?/sec 1.00 2.9±0.02ms ? ?/sec physical_plan_tpch_q16 1.33 5.3±0.02ms ? ?/sec 1.00 4.0±0.02ms ? ?/sec physical_plan_tpch_q17 1.29 4.9±0.03ms ? ?/sec 1.00 3.8±0.02ms ? ?/sec physical_plan_tpch_q18 1.33 5.5±0.06ms ? ?/sec 1.00 4.1±0.02ms ? ?/sec physical_plan_tpch_q19 1.29 10.1±0.09ms ? ?/sec 1.00 7.9±0.05ms ? ?/sec physical_plan_tpch_q2 1.44 12.3±0.09ms ? ?/sec 1.00 8.5±0.06ms ? ?/sec physical_plan_tpch_q20 1.32 6.4±0.05ms ? ?/sec 1.00 4.9±0.02ms ? ?/sec physical_plan_tpch_q21 1.41 9.5±0.03ms ? ?/sec 1.00 6.8±0.06ms ? ?/sec physical_plan_tpch_q22 1.29 4.7±0.03ms ? ?/sec 1.00 3.6±0.03ms ? ?/sec physical_plan_tpch_q3 1.27 4.2±0.03ms ? ?/sec 1.00 3.3±0.02ms ? ?/sec physical_plan_tpch_q4 1.39 3.4±0.02ms ? ?/sec 1.00 2.4±0.02ms ? ?/sec physical_plan_tpch_q5 1.27 6.1±0.06ms ? ?/sec 1.00 4.8±0.03ms ? ?/sec physical_plan_tpch_q6 1.17 2.1±0.01ms ? ?/sec 1.00 1752.6±12.06µs ? ?/sec physical_plan_tpch_q7 1.39 8.7±0.08ms ? ?/sec 1.00 6.2±0.03ms ? ?/sec physical_plan_tpch_q8 1.52 12.2±0.08ms ? ?/sec 1.00 8.0±0.03ms ? ?/sec physical_plan_tpch_q9 1.53 9.2±0.06ms ? ?/sec 1.00 6.0±0.05ms ? ?/sec physical_select_all_from_1000 7.42 683.8±1.12ms ? ?/sec 1.00 92.2±0.49ms ? ?/sec physical_select_one_from_700 1.12 4.2±0.02ms ? ?/sec 1.00 3.7±0.04ms ? ?/sec ``` I compared `37.0.0` (with the tpcds benchmark) on this branch: https://github.com/alamb/arrow-datafusion/tree/alamb/37_bench <details><summary>Comparison</summary> <p> ``` ```shell set -x -e ## This script tests planning speed of 37.0.0 against the speed on planning on main git fetch -p apache git fetch -p alamb # remove old test runs rm -rf target/criterion/ # use a version of 37 with the tpcds benchmarks BRANCH_NAME="37.0.0" git checkout alamb/37_bench git reset --hard alamb/alamb/37_bench cargo update cargo bench --bench sql_planner -- --save-baseline ${BRANCH_NAME} echo "** Comparing to main" git checkout main git reset --hard apache/main cargo update cargo bench --bench sql_planner -- --save-baseline main critcmp main ${BRANCH_NAME} ``` </p> </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org