Re: High performance vectorized reader meeting notes

Zhenxiao Luo Mon, 27 Oct 2014 19:42:05 -0700

Thanks Brock and Jason.

I just drafted a proposed APIs for vectorized Parquet reader(attached in
this email). Any comments and suggestions are appreciated.


Thanks,
Zhenxiao

On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <[email protected]> wrote:

> Hi,
>
> The Hive + Parquet community is very interested in improving performance of
> Hive + Parquet and Parquet generally. We are very interested in
> contributing to the Parquet vectorization and lazy materialization effort.
> Please add myself to any future meetings on this topic.
>
> BTW here it the JIRA tracking this effort from the Hive side:
> https://issues.apache.org/jira/browse/HIVE-8120
>
> Brock
>
> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <[email protected]>
> wrote:
>
> > Thanks Jason.
> >
> > Yes, Netflix is using Presto and Parquet for our BigDataPlatform(
> >
> >
> http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html
> > ).
> >
> > The fastest format currently in Presto is ORC, not DWRF(Parquet is fast,
> > but not as fast as ORC). We are referring to ORC, not facebook's DWRF
> > implementation.
> >
> > We already get Parquet working in Presto. We definitely would like to get
> > it as fast as ORC.
> >
> > Facebook has did native support for ORC in Presto, which does not use the
> > ORCRecordReader at all. They parses the ORC footer, and does Predicate
> > Pushdown by skipping row groups, Vectorization by introducing Type
> Specific
> > Vectors, and Lazy Materialization by introducing LazyVectors(their code
> has
> > not been committed yet, I mean their pull request). We are planning to do
> > similar optimization for Parquet in Presto.
> >
> > For the ParquetRecordReader, we need additional APIs to read the next
> Batch
> > of values, and read in a vector of values. For example, here are the
> > related APIs in the ORC code:
> >
> > /**
> >    * Read the next row batch. The size of the batch to read cannot be
> > controlled
> >    * by the callers. Caller need to look at VectorizedRowBatch.size of
> the
> > retunred
> >    * object to know the batch size read.
> >    * @param previousBatch a row batch object that can be reused by the
> > reader
> >    * @return the row batch that was read
> >    * @throws java.io.IOException
> >    */
> >   VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch) throws
> > IOException;
> >
> > And, here are the related APIs in Presto code, which is used for ORC
> > support in Presto:
> >
> > public void readVector(int columnIndex, Object vector);
> >
> > For lazy materialization, we may also consider adding LazyVectors or
> > LazyBlocks, so that the value is not materialized until they are accessed
> > by the Operator.
> >
> > Any comments and suggestions are appreciated.
> >
> > Thanks,
> > Zhenxiao
> >
> >
> > On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <
> [email protected]>
> > wrote:
> >
> > > Hello All,
> > >
> > > No updates from me yet, just sending out another message for some of
> the
> > > Netflix engineers that were still just subscribed to the google group
> > mail.
> > > This will allow them to respond directly with their research on the
> > > optimized ORC reader for consideration in the design discussion.
> > >
> > > -Jason
> > >
> > > On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <
> > [email protected]
> > > >
> > > wrote:
> > >
> > > > Hello Parquet team,
> > > >
> > > > I wanted to report the results of a discussion between the Drill team
> > and
> > > > the engineers  at Netflix working to make Parquet run faster with
> > Presto.
> > > > As we have said in the last few hangouts we both want to make
> > > contributions
> > > > back to parquet-mr to add features and performance. We thought it
> would
> > > be
> > > > good to sit down and speak directly about our real goals and the best
> > > next
> > > > steps to get an engineering effort started to accomplish these goals.
> > > >
> > > > Below is a summary of the meeting.
> > > >
> > > > - Meeting notes
> > > >
> > > >    - Attendees:
> > > >
> > > >        - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
> > > >
> > > >        - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse, Parth
> > > Chandra
> > > >
> > > > - Minutes
> > > >
> > > >    - Introductions/ Background
> > > >
> > > >    - Netflix
> > > >
> > > >        - Working on providing interactive SQL querying to users
> > > >
> > > >        - have chosen Presto as the query engine and Parquet as high
> > > > performance data
> > > >
> > > >          storage format
> > > >
> > > >        - Presto is providing needed speed in some cases, but others
> are
> > > > missing optimizations
> > > >
> > > >          that could be avoiding reads
> > > >
> > > >        - Have already started some development and investigation,
> have
> > > > identified key goals
> > > >
> > > >        - Some initial benchmarks with a modified ORC reader DWRF,
> > written
> > > > by the Presto
> > > >
> > > >          team shows that such gains are possible with a different
> > reader
> > > > implementation
> > > >
> > > >        - goals
> > > >
> > > >            - filter pushdown
> > > >
> > > >                - skipping reads based on filter evaluation on one or
> > more
> > > > columns
> > > >
> > > >                - this can happen at several granularities : row
> group,
> > > > page, record/value
> > > >
> > > >            - late/lazy materialization
> > > >
> > > >                - for columns not involved in a filter, avoid
> > > materializing
> > > > them entirely
> > > >
> > > >                  until they are know to be needed after evaluating a
> > > > filter on other columns
> > > >
> > > >    - Drill
> > > >
> > > >        - the Drill engine uses an in-memory vectorized representation
> > of
> > > > records
> > > >
> > > >        - for scalar and repeated types we have implemented a fast
> > > > vectorized reader
> > > >
> > > >          that is optimized to transform between Parquet's on disk and
> > our
> > > > in-memory format
> > > >
> > > >        - this is currently producing performant table scans, but has
> no
> > > > facility for filter
> > > >
> > > >          push down
> > > >
> > > >        - Major goals going forward
> > > >
> > > >            - filter pushdown
> > > >
> > > >                - decide the best implementation for incorporating
> > filter
> > > > pushdown into
> > > >
> > > >                  our current implementation, or figure out a way to
> > > > leverage existing
> > > >
> > > >                  work in the parquet-mr library to accomplish this
> goal
> > > >
> > > >            - late/lazy materialization
> > > >
> > > >                - see above
> > > >
> > > >            - contribute existing code back to parquet
> > > >
> > > >                - the Drill parquet reader has a very strong emphasis
> on
> > > > performance, a
> > > >
> > > >                  clear interface to consume, that is sufficiently
> > > > separated from Drill
> > > >
> > > >                  could prove very useful for other projects
> > > >
> > > >    - First steps
> > > >
> > > >        - Netflix team will share some of their thoughts and research
> > from
> > > > working with
> > > >
> > > >          the DWRF code
> > > >
> > > >            - we can have a discussion based off of this, which
> aspects
> > > are
> > > > done well,
> > > >
> > > >              and any opportunities they may have missed that we can
> > > > incorporate into our
> > > >
> > > >              design
> > > >
> > > >            - do further investigation and ask the existing community
> > for
> > > > guidance on existing
> > > >
> > > >              parquet-mr features or planned APIs that may provide
> > desired
> > > > functionality
> > > >
> > > >        - We will begin a discussion of an API for the new
> functionality
> > > >
> > > >            - some outstanding thoughts for down the road
> > > >
> > > >                - The Drill team has an interest in very late
> > > > materialization for data stored
> > > >
> > > >                  in dictionary encoded pages, such as running a join
> or
> > > > filter on the dictionary
> > > >
> > > >                  and then going back to the reader to grab all of the
> > > > values in the data that match
> > > >
> > > >                  the needed members of the dictionary
> > > >
> > > >                    - this is a later consideration, but just some of
> > the
> > > > idea of the reason we are
> > > >
> > > >                      opening up the design discussion early so that
> the
> > > > API can be flexible enough
> > > >                      to allow this in the further, even if not
> > > implemented
> > > > too soon
> > > >
> > >
> >
>

Re: High performance vectorized reader meeting notes

Reply via email to