If I recall, part of the initial patch for Parquet pushdown was more
focused on rowgroup pruning during planning. I believe it was based on the
old partition pruning code (could be wrong).  Furthermore, it conflicted
with the behavior of the metadata caching since the caching didn't (at the
time) require page statistics. With Steven more recent patch, I believe
stats are now recorded but I imagine a bunch of refactoring would need to
be done to complete the changes.  The other part was the filter pushdown in
the actual readers. I don't remember if there were conflicts there or not.
Definitely something that is worth getting merged.  Just wanted to provide
heads up on the potential challenges.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Aug 31, 2015 at 9:22 PM, Jinfeng Ni <[email protected]> wrote:

> I heard that there are some issues between filter push-down and parquet
> metadata caching thing. But I'm not clear what exactly the problem is, and
> whether we have a plan to resolve that. Can you elaborate what the open
> questions
> are and the conflicts with metadata caching?
>
> The reason I'm trying to look at the filer pushdown is that one query
> posted
> in the user list couple of days ago performed really bad on Drill 1.1,
> compared with
> other system. We did some comparison analysis and thought the difference
> mainly comes from the fact that Drill lacks the parquet filter pushdown
> capability.
> At least for now, the only way for Drill to match the other system's
> performance
> is to enable filter pushdown for that query.
>
> In the meantime, we also identified some room for improvement in Drill's
> run-time
> generated code, when it is used for filter evaluation. I'll submit a patch
> for review
> shortly.
>
> Regards,
>
> Jinfeng
>
>
>
>
>
>
>
> On Mon, Aug 31, 2015 at 8:13 PM, Jacques Nadeau <[email protected]>
> wrote:
>
> > Given that Julien and Jason are working heavily on a merge into Parquet,
> I
> > strongly suggest waiting on merging other patches around that code (or at
> > least working on top of the changes they are doing.
> >
> > I thought there were a number of open questions around the filter
> pushdown
> > and how it related to the metadata caching stuff. Have those been
> resolved?
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Mon, Aug 31, 2015 at 3:25 PM, Jinfeng Ni <[email protected]>
> wrote:
> >
> > > I'm actually trying Adam's parquet filter pushdown patch (DRILL-1950).
> > > That's
> > > why I happened to click one parquet class and hit the above "source
> code
> > > not found" error.
> > >
> > > Thanks!
> > >
> > >
> > >
> > > On Mon, Aug 31, 2015 at 3:20 PM, Jason Altekruse <
> > [email protected]
> > > >
> > > wrote:
> > >
> > > >
> https://github.com/mapr/incubator-parquet-mr/tree/1.6.0rc3-drill-r0.3
> > > >
> > > > I am working with Julien Le Dem on getting us off of the fork, but
> for
> > > now
> > > > the source code is accessible here. Let me know if you need any help
> > > > looking through the parquet code. Is there a particular JIRA you are
> > > trying
> > > > to address?
> > > >
> > > > On Mon, Aug 31, 2015 at 3:15 PM, Jinfeng Ni <[email protected]>
> > > wrote:
> > > >
> > > > > It seems we are using a forked parquet library. Can someone point
> me
> > > > > to the source code for the forked parquet ?
> > > > >
> > > > > I tried to download the source code within IDE, and it complains
> the
> > > > > following:
> > > > >
> > > > > "*Cannot download sources*
> > > > >
> > > > > Sources not found for:
> > > > > com.twitter:parquet-column:1.6.0rc3-drill-r0.3
> > > > >
> > > > > "
> > > > >
> > > > > So, looks like only the compiled code jar is published, but not the
> > > > source
> > > > > code jar file.
> > > > >
> > > >
> > >
> >
>

Reply via email to