Hi guys,

I'm trying to work on an issue I've raised with partition pruning:

https://issues.apache.org/jira/browse/DRILL-2287

Basically, because the partition pruning is done after the
DrillPushProjIntoScan, it seems like we can't detect that dir0 (for
example) is not actually needed to be projected if it's not in the SELECT
clause (or GROUP BY etc.).

Moreover, I've come up with an issue whereby if I have, for example, 3
directories - 2 with valid Parquet files and 1 with an invalid 0-byte
Parquet file, even if we're partition pruning to only the valid
directories, the query will fail (because it's trying to read the footer of
the invalid Parquet file).

It really feels like the partition pruning should be done before the
DrillPushProjIntoScan.

I know Jacques has just done some work on moving the partition pruning, so
I thought I'd open the discussion here first before making too many
in-roads into it.

I do feel if we're partition pruning, we shouldn't even try to read any of
those other directories during the planning stage.  Furthermore, it doesn't
make sense to prune the files being scanned but still keep a Filter
operation in the query plan and project dir0 throughout it if it's not
needed.  The latter is why the queries end up being a lot slower.

Thoughts?

Reply via email to