The parquet c++ implementation has all the facilities to expose the
required information to implement predicate pushdown. The experimental
Dataset API does make use of this with parquet. See [1] for an example
of the API. Or a real-life usage with the nyc-tlc taxi dataset [2].
The relevant implementation that takes care of pushdown predicate is
found in [3].

[1] 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/dataset_test.cc#L289-L409
[2] 
https://github.com/apache/arrow/blob/master/cpp/examples/arrow/dataset-parquet-scan-example.cc
[3] 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/file_parquet.cc

On Fri, Nov 15, 2019 at 1:08 AM Micah Kornfield <emkornfi...@gmail.com> wrote:
>
> #1 if there isn't a JIRA I would guess no-one is working on it (Note I
> would expect at least the initial work to be in aParquet JIRA item, and
> this is probably a discussion for that mailing list).
> #2. There are some open PR to expose the parquet reader through JNI to java
> [1]
> #3. Its possible Dremio has some code that does this.   I'm not sure what
> the current status of predicate pushdown in the C++ code base is.
>
>
> [1] https://github.com/apache/arrow/pull/5719
>
>
> On Wed, Nov 13, 2019 at 6:05 PM Chang Chen <baibaic...@gmail.com> wrote:
>
> > Hi
> >
> > I am trying to find doc about current parquet-cpp current status.  i
> > googled it, but i didn't find any useful information.
> >
> > here are what i concerned about:
> > #1  column indexes (https://issues.apache.org/jira/browse/PARQUET-1201),
> > the corresponding java implementation already supported it last year,
> > though it wasn't pushed to repo.
> > #2  A vectorized column reader interface which can be integrated in JAVA.
> > #3 the feature was illustrated here(
> > https://www.dremio.com/webinars/columnar-roadmap-apache-parquet-and-arrow/
> > ),
> > a better predict push down algorithm.
> >
> > Thanks
> >

Reply via email to