Hi Aman,

I've also created a second issue for the invalid 0 length parquet files not
being pruned out:

https://issues.apache.org/jira/browse/DRILL-2517

I've done a bit of work on resolving it but need some input to see if I'm
going down the right path.

On Mon, Mar 23, 2015 at 12:54 PM, Aman Sinha <[email protected]> wrote:

> Hi Adam,
> I will update DRILL-2287 with some comments because it has more context
> than this discussion thread.  We can continue the discussion there.  The
> issue of the invalid 0 length parquet files being read sounds like a
> different issue.
>
> Aman
>
> On Sun, Mar 22, 2015 at 6:48 PM, Adam Gilmore <[email protected]>
> wrote:
>
> > Hi guys,
> >
> > I'm trying to work on an issue I've raised with partition pruning:
> >
> > https://issues.apache.org/jira/browse/DRILL-2287
> >
> > Basically, because the partition pruning is done after the
> > DrillPushProjIntoScan, it seems like we can't detect that dir0 (for
> > example) is not actually needed to be projected if it's not in the SELECT
> > clause (or GROUP BY etc.).
> >
> > Moreover, I've come up with an issue whereby if I have, for example, 3
> > directories - 2 with valid Parquet files and 1 with an invalid 0-byte
> > Parquet file, even if we're partition pruning to only the valid
> > directories, the query will fail (because it's trying to read the footer
> of
> > the invalid Parquet file).
> >
> > It really feels like the partition pruning should be done before the
> > DrillPushProjIntoScan.
> >
> > I know Jacques has just done some work on moving the partition pruning,
> so
> > I thought I'd open the discussion here first before making too many
> > in-roads into it.
> >
> > I do feel if we're partition pruning, we shouldn't even try to read any
> of
> > those other directories during the planning stage.  Furthermore, it
> doesn't
> > make sense to prune the files being scanned but still keep a Filter
> > operation in the query plan and project dir0 throughout it if it's not
> > needed.  The latter is why the queries end up being a lot slower.
> >
> > Thoughts?
> >
>

Reply via email to