alamb commented on PR #5099: URL: https://github.com/apache/arrow-datafusion/pull/5099#issuecomment-1476741489
TLDR looks like this feature makes Q7 and Q16 slower on TPCH benchmarks I think we need to review this more To test I used ``` datafusion: alamb/enable_page_pruning datafusion2: main as of 26e1b20ea3362ea62cb713004a0636b8af6a16d7 ``` And ran the tpch queries or both SF1 and SF10 (1GB and 10GB against parquet datasets) on a google cloud machine: ```shell cargo run --release --bin tpch -- benchmark datafusion --iterations 5 --path ~/tpch_data/parquet_data_SF1 --format parquet -o ~/enable_page_index ``` My results are as follows ``` alamb@aal-dev:~/arrow-datafusion3/benchmarks$ ./compare.py ~/main_1GB/tpch-summary--1679329989.json ~/enable_page_index_1GB/tpch-summary--1679328275.json ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Query ┃ /home/alamb… ┃ /home/alamb… ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ Q1 │ 1709.64ms │ 1695.27ms │ no change │ │ Q2 │ 490.80ms │ 472.05ms │ no change │ │ Q3 │ 560.96ms │ 556.39ms │ no change │ │ Q4 │ 221.62ms │ 212.50ms │ no change │ │ Q5 │ 749.65ms │ 749.12ms │ no change │ │ Q6 │ 458.11ms │ 452.70ms │ no change │ │ Q7 │ 1184.62ms │ 1297.19ms │ 1.10x slower │ │ Q8 │ 707.43ms │ 728.24ms │ no change │ │ Q9 │ 1195.69ms │ 1198.06ms │ no change │ │ Q10 │ 776.29ms │ 833.59ms │ 1.07x slower │ │ Q11 │ 381.73ms │ 392.42ms │ no change │ │ Q12 │ 329.34ms │ 343.47ms │ no change │ │ Q13 │ 1371.40ms │ 1339.00ms │ no change │ │ Q14 │ 443.23ms │ 454.51ms │ no change │ │ Q15 │ 448.54ms │ 464.96ms │ no change │ │ Q16 │ 278.15ms │ 318.71ms │ 1.15x slower │ │ Q17 │ 6150.47ms │ 5874.44ms │ no change │ │ Q18 │ 3574.89ms │ 3929.19ms │ 1.10x slower │ │ Q19 │ 792.59ms │ 775.01ms │ no change │ │ Q20 │ 1720.97ms │ 1851.68ms │ 1.08x slower │ │ Q21 │ 1726.90ms │ 1864.49ms │ 1.08x slower │ │ Q22 │ 525.99ms │ 198.84ms │ +2.65x faster │ └──────────────┴──────────────┴──────────────┴───────────────┘ alamb@aal-dev:~/arrow-datafusion3/benchmarks$ ./compare.py ~/main_10GB/tpch-summary--1679330119.json ~/enable_page_index_10GB/tpch-summary--1679328405.json ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Query ┃ /home/alamb… ┃ /home/alamb… ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ Q1 │ 16252.56ms │ 16031.82ms │ no change │ │ Q2 │ 3994.56ms │ 4353.75ms │ 1.09x slower │ │ Q3 │ 5572.06ms │ 5620.27ms │ no change │ │ Q4 │ 2144.14ms │ 2194.67ms │ no change │ │ Q5 │ 7796.93ms │ 7646.74ms │ no change │ │ Q6 │ 4382.32ms │ 4327.16ms │ no change │ │ Q7 │ 18702.50ms │ 19922.74ms │ 1.07x slower │ │ Q8 │ 7383.74ms │ 7616.21ms │ no change │ │ Q9 │ 13855.17ms │ 14408.42ms │ no change │ │ Q10 │ 7446.05ms │ 8030.00ms │ 1.08x slower │ │ Q11 │ 3414.81ms │ 3850.34ms │ 1.13x slower │ │ Q12 │ 3027.16ms │ 3085.89ms │ no change │ │ Q13 │ 18859.06ms │ 18627.02ms │ no change │ │ Q14 │ 4157.91ms │ 4140.22ms │ no change │ │ Q15 │ 5293.05ms │ 5369.17ms │ no change │ │ Q16 │ 6512.42ms │ 3011.58ms │ +2.16x faster │ │ Q17 │ 86253.33ms │ 76036.06ms │ +1.13x faster │ │ Q18 │ 45101.99ms │ 49717.76ms │ 1.10x slower │ │ Q19 │ 7323.15ms │ 7409.85ms │ no change │ │ Q20 │ 19902.39ms │ 20965.94ms │ 1.05x slower │ │ Q21 │ 22040.06ms │ 23184.84ms │ 1.05x slower │ │ Q22 │ 2011.87ms │ 2143.62ms │ 1.07x slower │ └──────────────┴──────────────┴──────────────┴───────────────┘ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
