alamb commented on issue #1441: URL: https://github.com/apache/arrow-datafusion/issues/1441#issuecomment-1000360146
The plot thickens! 🕵️ Regarding parquet predicate pruning, amusingly in this case, I think row group pruning actually helps avoid the problem. As you may recall, when a filter is applied like this ```sql select * from stops_parquet where trip_tid=54788307; ``` The answer is correct (`stop_name` is `"RONDO ZESŁAŃCÓW SYBERYJSKICH"`): ```sql +---------------------+----------+-----------+------------------------------+ | time | trip_tid | trip_line | stop_name | +---------------------+----------+-----------+------------------------------+ | 2021-11-15 00:00:00 | 54788307 | 186 | RONDO ZESŁAŃCÓW SYBERYJSKICH | +---------------------+----------+-----------+------------------------------+ ``` However, when I disable pruning then the wrong answer comes out! ``` +---------------------+----------+-----------+-----------+ | time | trip_tid | trip_line | stop_name | +---------------------+----------+-----------+-----------+ | 2021-11-15 00:00:00 | 54788307 | 186 | Armatnia | +---------------------+----------+-----------+-----------+ ``` I added some debugging, and verified that the query does in fact skip several row groups: ``` Row Group[0], col ReportStopRecord: pruned = true Row Group[1], col ReportStopRecord: pruned = false Row Group[2], col ReportStopRecord: pruned = false Row Group[3], col ReportStopRecord: pruned = false Row Group[4], col ReportStopRecord: pruned = false Row Group[5], col ReportStopRecord: pruned = false Row Group[6], col ReportStopRecord: pruned = false Row Group[7], col ReportStopRecord: pruned = false Row Group[8], col ReportStopRecord: pruned = false Row Group[9], col ReportStopRecord: pruned = true Row Group[10], col ReportStopRecord: pruned = false Row Group[11], col ReportStopRecord: pruned = false Row Group[12], col ReportStopRecord: pruned = false Row Group[13], col ReportStopRecord: pruned = false Row Group[14], col ReportStopRecord: pruned = false Row Group[15], col ReportStopRecord: pruned = false Row Group[16], col ReportStopRecord: pruned = false Row Group[17], col ReportStopRecord: pruned = false Row Group[18], col ReportStopRecord: pruned = false Row Group[19], col ReportStopRecord: pruned = false Row Group[20], col ReportStopRecord: pruned = false Row Group[21], col ReportStopRecord: pruned = false Row Group[22], col ReportStopRecord: pruned = false Row Group[23], col ReportStopRecord: pruned = false Row Group[24], col ReportStopRecord: pruned = false Row Group[25], col ReportStopRecord: pruned = false Row Group[26], col ReportStopRecord: pruned = false Row Group[27], col ReportStopRecord: pruned = false Row Group[28], col ReportStopRecord: pruned = false Row Group[29], col ReportStopRecord: pruned = false Row Group[30], col ReportStopRecord: pruned = false Row Group[31], col ReportStopRecord: pruned = false Row Group[32], col ReportStopRecord: pruned = true Row Group[33], col ReportStopRecord: pruned = false Row Group[34], col ReportStopRecord: pruned = true ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
