alamb commented on PR #5099:
URL: 
https://github.com/apache/arrow-datafusion/pull/5099#issuecomment-1476741489

   TLDR looks like this feature makes Q7 and Q16 slower on TPCH benchmarks
   
   I think we need to review this more
   
   To test I used 
   
   ```
   datafusion: alamb/enable_page_pruning
   datafusion2: main as of  26e1b20ea3362ea62cb713004a0636b8af6a16d7
   ```
   
   And ran the tpch queries or both SF1 and SF10 (1GB and 10GB against parquet 
datasets) on a google cloud machine:
   
   ```shell
   cargo run --release --bin tpch -- benchmark datafusion --iterations 5 --path 
~/tpch_data/parquet_data_SF1 --format parquet -o ~/enable_page_index
   ```
   
   My results are as follows
   
   ```
   alamb@aal-dev:~/arrow-datafusion3/benchmarks$ ./compare.py 
~/main_1GB/tpch-summary--1679329989.json 
~/enable_page_index_1GB/tpch-summary--1679328275.json
   ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
   ┃ Query        ┃ /home/alamb… ┃ /home/alamb… ┃        Change ┃
   ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
   │ Q1           │    1709.64ms │    1695.27ms │     no change │
   │ Q2           │     490.80ms │     472.05ms │     no change │
   │ Q3           │     560.96ms │     556.39ms │     no change │
   │ Q4           │     221.62ms │     212.50ms │     no change │
   │ Q5           │     749.65ms │     749.12ms │     no change │
   │ Q6           │     458.11ms │     452.70ms │     no change │
   │ Q7           │    1184.62ms │    1297.19ms │  1.10x slower │
   │ Q8           │     707.43ms │     728.24ms │     no change │
   │ Q9           │    1195.69ms │    1198.06ms │     no change │
   │ Q10          │     776.29ms │     833.59ms │  1.07x slower │
   │ Q11          │     381.73ms │     392.42ms │     no change │
   │ Q12          │     329.34ms │     343.47ms │     no change │
   │ Q13          │    1371.40ms │    1339.00ms │     no change │
   │ Q14          │     443.23ms │     454.51ms │     no change │
   │ Q15          │     448.54ms │     464.96ms │     no change │
   │ Q16          │     278.15ms │     318.71ms │  1.15x slower │
   │ Q17          │    6150.47ms │    5874.44ms │     no change │
   │ Q18          │    3574.89ms │    3929.19ms │  1.10x slower │
   │ Q19          │     792.59ms │     775.01ms │     no change │
   │ Q20          │    1720.97ms │    1851.68ms │  1.08x slower │
   │ Q21          │    1726.90ms │    1864.49ms │  1.08x slower │
   │ Q22          │     525.99ms │     198.84ms │ +2.65x faster │
   └──────────────┴──────────────┴──────────────┴───────────────┘
   alamb@aal-dev:~/arrow-datafusion3/benchmarks$ ./compare.py 
~/main_10GB/tpch-summary--1679330119.json  
~/enable_page_index_10GB/tpch-summary--1679328405.json
   ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
   ┃ Query        ┃ /home/alamb… ┃ /home/alamb… ┃        Change ┃
   ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
   │ Q1           │   16252.56ms │   16031.82ms │     no change │
   │ Q2           │    3994.56ms │    4353.75ms │  1.09x slower │
   │ Q3           │    5572.06ms │    5620.27ms │     no change │
   │ Q4           │    2144.14ms │    2194.67ms │     no change │
   │ Q5           │    7796.93ms │    7646.74ms │     no change │
   │ Q6           │    4382.32ms │    4327.16ms │     no change │
   │ Q7           │   18702.50ms │   19922.74ms │  1.07x slower │
   │ Q8           │    7383.74ms │    7616.21ms │     no change │
   │ Q9           │   13855.17ms │   14408.42ms │     no change │
   │ Q10          │    7446.05ms │    8030.00ms │  1.08x slower │
   │ Q11          │    3414.81ms │    3850.34ms │  1.13x slower │
   │ Q12          │    3027.16ms │    3085.89ms │     no change │
   │ Q13          │   18859.06ms │   18627.02ms │     no change │
   │ Q14          │    4157.91ms │    4140.22ms │     no change │
   │ Q15          │    5293.05ms │    5369.17ms │     no change │
   │ Q16          │    6512.42ms │    3011.58ms │ +2.16x faster │
   │ Q17          │   86253.33ms │   76036.06ms │ +1.13x faster │
   │ Q18          │   45101.99ms │   49717.76ms │  1.10x slower │
   │ Q19          │    7323.15ms │    7409.85ms │     no change │
   │ Q20          │   19902.39ms │   20965.94ms │  1.05x slower │
   │ Q21          │   22040.06ms │   23184.84ms │  1.05x slower │
   │ Q22          │    2011.87ms │    2143.62ms │  1.07x slower │
   └──────────────┴──────────────┴──────────────┴───────────────┘
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to