Hi guys, I'm trying to work on an issue I've raised with partition pruning:
https://issues.apache.org/jira/browse/DRILL-2287 Basically, because the partition pruning is done after the DrillPushProjIntoScan, it seems like we can't detect that dir0 (for example) is not actually needed to be projected if it's not in the SELECT clause (or GROUP BY etc.). Moreover, I've come up with an issue whereby if I have, for example, 3 directories - 2 with valid Parquet files and 1 with an invalid 0-byte Parquet file, even if we're partition pruning to only the valid directories, the query will fail (because it's trying to read the footer of the invalid Parquet file). It really feels like the partition pruning should be done before the DrillPushProjIntoScan. I know Jacques has just done some work on moving the partition pruning, so I thought I'd open the discussion here first before making too many in-roads into it. I do feel if we're partition pruning, we shouldn't even try to read any of those other directories during the planning stage. Furthermore, it doesn't make sense to prune the files being scanned but still keep a Filter operation in the query plan and project dir0 throughout it if it's not needed. The latter is why the queries end up being a lot slower. Thoughts?
