alamb edited a comment on issue #1441:
URL: 
https://github.com/apache/arrow-datafusion/issues/1441#issuecomment-1000360146


   The plot thickens! 🕵️  
   
   Regarding parquet predicate pruning, amusingly in this case, I think row 
group pruning actually helps avoid the problem. As you may recall, when a 
filter is applied like this
   
   ```sql
   select * from stops_parquet where trip_tid=54788307;
   ```
   
   The answer is correct (`stop_name` is `"RONDO ZESŁAŃCÓW SYBERYJSKICH"`):
   ```sql
   +---------------------+----------+-----------+------------------------------+
   | time                | trip_tid | trip_line | stop_name                    |
   +---------------------+----------+-----------+------------------------------+
   | 2021-11-15 00:00:00 | 54788307 | 186       | RONDO ZESŁAŃCÓW SYBERYJSKICH |
   +---------------------+----------+-----------+------------------------------+
   ```
   
   However, when I disable pruning then the wrong answer comes out!
   ```sql
   +---------------------+----------+-----------+-----------+
   | time                | trip_tid | trip_line | stop_name |
   +---------------------+----------+-----------+-----------+
   | 2021-11-15 00:00:00 | 54788307 | 186       | Armatnia  |
   +---------------------+----------+-----------+-----------+
   ```
   
   I added some debugging, and verified that the query does in fact skip 
several row groups:
   
   ```
   Row Group[0], col  ReportStopRecord: pruned = true
   Row Group[1], col  ReportStopRecord: pruned = false
   Row Group[2], col  ReportStopRecord: pruned = false
   Row Group[3], col  ReportStopRecord: pruned = false
   Row Group[4], col  ReportStopRecord: pruned = false
   Row Group[5], col  ReportStopRecord: pruned = false
   Row Group[6], col  ReportStopRecord: pruned = false
   Row Group[7], col  ReportStopRecord: pruned = false
   Row Group[8], col  ReportStopRecord: pruned = false
   Row Group[9], col  ReportStopRecord: pruned = true
   Row Group[10], col  ReportStopRecord: pruned = false
   Row Group[11], col  ReportStopRecord: pruned = false
   Row Group[12], col  ReportStopRecord: pruned = false
   Row Group[13], col  ReportStopRecord: pruned = false
   Row Group[14], col  ReportStopRecord: pruned = false
   Row Group[15], col  ReportStopRecord: pruned = false
   Row Group[16], col  ReportStopRecord: pruned = false
   Row Group[17], col  ReportStopRecord: pruned = false
   Row Group[18], col  ReportStopRecord: pruned = false
   Row Group[19], col  ReportStopRecord: pruned = false
   Row Group[20], col  ReportStopRecord: pruned = false
   Row Group[21], col  ReportStopRecord: pruned = false
   Row Group[22], col  ReportStopRecord: pruned = false
   Row Group[23], col  ReportStopRecord: pruned = false
   Row Group[24], col  ReportStopRecord: pruned = false
   Row Group[25], col  ReportStopRecord: pruned = false
   Row Group[26], col  ReportStopRecord: pruned = false
   Row Group[27], col  ReportStopRecord: pruned = false
   Row Group[28], col  ReportStopRecord: pruned = false
   Row Group[29], col  ReportStopRecord: pruned = false
   Row Group[30], col  ReportStopRecord: pruned = false
   Row Group[31], col  ReportStopRecord: pruned = false
   Row Group[32], col  ReportStopRecord: pruned = true
   Row Group[33], col  ReportStopRecord: pruned = false
   Row Group[34], col  ReportStopRecord: pruned = true
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to