hhhizzz opened a new issue, #1388:
URL: https://github.com/apache/auron/issues/1388

   
   **Describe the bug**  
   After enabling `spark.auron.parquet.enable.pageFiltering`, overall TPC-DS 
query performance degrades by around **20%**. In particular, 
[q37](https://github.com/apache/spark/blob/46ac78ea367cfa9a7acc04482770aaca33f5a575/sql/core/src/test/resources/tpcds/q37.sql#L10)
 becomes more than **50% slower**.  
   
   **To Reproduce**  
   1. Set `spark.auron.parquet.enable.pageFiltering = true`.  
   2. Run the TPC-DS test suite.  
   
   **Expected behavior**  
   Most queries should benefit from page filtering and run faster. However, in 
practice, many queries run slower, and some (e.g., q37) slow down by more than 
50%.  
   
   **Screenshots**  
   N/A  
   
   **Additional context**  
   I simplified q37 and found that the slowdown can be reproduced with just:  
   
   ```sql
   SELECT
       *
   FROM
       inventory AS inv
   WHERE
       inv.inv_quantity_on_hand BETWEEN 100 AND 500
   ````
   
   The issue also reproduces in 
[DataFusion](https://github.com/apache/datafusion) and 
[arrow-rs](https://github.com/apache/arrow-rs), suggesting the root cause may 
be in **arrow-rs**. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to