Re: High performance vectorized reader meeting notes

Zhenxiao Luo Mon, 27 Oct 2014 23:19:32 -0700

Thanks Jacques.

Here is the gist:
https://gist.github.com/zhenxiao/2728ce4fe0a7be2d3b30


Comments and Suggestions are appreciated.

Thanks,
Zhenxiao

On Mon, Oct 27, 2014 at 10:55 PM, Jacques Nadeau <[email protected]> wrote:

> You can't send attachments.  Can you post as google doc or gist?
>
> On Mon, Oct 27, 2014 at 7:41 PM, Zhenxiao Luo <[email protected]>
> wrote:
>
> >
> > Thanks Brock and Jason.
> >
> > I just drafted a proposed APIs for vectorized Parquet reader(attached in
> > this email). Any comments and suggestions are appreciated.
> >
> > Thanks,
> > Zhenxiao
> >
> > On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <[email protected]> wrote:
> >
> >> Hi,
> >>
> >> The Hive + Parquet community is very interested in improving performance
> >> of
> >> Hive + Parquet and Parquet generally. We are very interested in
> >> contributing to the Parquet vectorization and lazy materialization
> effort.
> >> Please add myself to any future meetings on this topic.
> >>
> >> BTW here it the JIRA tracking this effort from the Hive side:
> >> https://issues.apache.org/jira/browse/HIVE-8120
> >>
> >> Brock
> >>
> >> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <[email protected]>
> >> wrote:
> >>
> >> > Thanks Jason.
> >> >
> >> > Yes, Netflix is using Presto and Parquet for our BigDataPlatform(
> >> >
> >> >
> >>
> http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html
> >> > ).
> >> >
> >> > The fastest format currently in Presto is ORC, not DWRF(Parquet is
> fast,
> >> > but not as fast as ORC). We are referring to ORC, not facebook's DWRF
> >> > implementation.
> >> >
> >> > We already get Parquet working in Presto. We definitely would like to
> >> get
> >> > it as fast as ORC.
> >> >
> >> > Facebook has did native support for ORC in Presto, which does not use
> >> the
> >> > ORCRecordReader at all. They parses the ORC footer, and does Predicate
> >> > Pushdown by skipping row groups, Vectorization by introducing Type
> >> Specific
> >> > Vectors, and Lazy Materialization by introducing LazyVectors(their
> code
> >> has
> >> > not been committed yet, I mean their pull request). We are planning to
> >> do
> >> > similar optimization for Parquet in Presto.
> >> >
> >> > For the ParquetRecordReader, we need additional APIs to read the next
> >> Batch
> >> > of values, and read in a vector of values. For example, here are the
> >> > related APIs in the ORC code:
> >> >
> >> > /**
> >> >    * Read the next row batch. The size of the batch to read cannot be
> >> > controlled
> >> >    * by the callers. Caller need to look at VectorizedRowBatch.size of
> >> the
> >> > retunred
> >> >    * object to know the batch size read.
> >> >    * @param previousBatch a row batch object that can be reused by the
> >> > reader
> >> >    * @return the row batch that was read
> >> >    * @throws java.io.IOException
> >> >    */
> >> >   VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch)
> throws
> >> > IOException;
> >> >
> >> > And, here are the related APIs in Presto code, which is used for ORC
> >> > support in Presto:
> >> >
> >> > public void readVector(int columnIndex, Object vector);
> >> >
> >> > For lazy materialization, we may also consider adding LazyVectors or
> >> > LazyBlocks, so that the value is not materialized until they are
> >> accessed
> >> > by the Operator.
> >> >
> >> > Any comments and suggestions are appreciated.
> >> >
> >> > Thanks,
> >> > Zhenxiao
> >> >
> >> >
> >> > On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <
> >> [email protected]>
> >> > wrote:
> >> >
> >> > > Hello All,
> >> > >
> >> > > No updates from me yet, just sending out another message for some of
> >> the
> >> > > Netflix engineers that were still just subscribed to the google
> group
> >> > mail.
> >> > > This will allow them to respond directly with their research on the
> >> > > optimized ORC reader for consideration in the design discussion.
> >> > >
> >> > > -Jason
> >> > >
> >> > > On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <
> >> > [email protected]
> >> > > >
> >> > > wrote:
> >> > >
> >> > > > Hello Parquet team,
> >> > > >
> >> > > > I wanted to report the results of a discussion between the Drill
> >> team
> >> > and
> >> > > > the engineers  at Netflix working to make Parquet run faster with
> >> > Presto.
> >> > > > As we have said in the last few hangouts we both want to make
> >> > > contributions
> >> > > > back to parquet-mr to add features and performance. We thought it
> >> would
> >> > > be
> >> > > > good to sit down and speak directly about our real goals and the
> >> best
> >> > > next
> >> > > > steps to get an engineering effort started to accomplish these
> >> goals.
> >> > > >
> >> > > > Below is a summary of the meeting.
> >> > > >
> >> > > > - Meeting notes
> >> > > >
> >> > > >    - Attendees:
> >> > > >
> >> > > >        - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
> >> > > >
> >> > > >        - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse,
> Parth
> >> > > Chandra
> >> > > >
> >> > > > - Minutes
> >> > > >
> >> > > >    - Introductions/ Background
> >> > > >
> >> > > >    - Netflix
> >> > > >
> >> > > >        - Working on providing interactive SQL querying to users
> >> > > >
> >> > > >        - have chosen Presto as the query engine and Parquet as
> high
> >> > > > performance data
> >> > > >
> >> > > >          storage format
> >> > > >
> >> > > >        - Presto is providing needed speed in some cases, but
> others
> >> are
> >> > > > missing optimizations
> >> > > >
> >> > > >          that could be avoiding reads
> >> > > >
> >> > > >        - Have already started some development and investigation,
> >> have
> >> > > > identified key goals
> >> > > >
> >> > > >        - Some initial benchmarks with a modified ORC reader DWRF,
> >> > written
> >> > > > by the Presto
> >> > > >
> >> > > >          team shows that such gains are possible with a different
> >> > reader
> >> > > > implementation
> >> > > >
> >> > > >        - goals
> >> > > >
> >> > > >            - filter pushdown
> >> > > >
> >> > > >                - skipping reads based on filter evaluation on one
> or
> >> > more
> >> > > > columns
> >> > > >
> >> > > >                - this can happen at several granularities : row
> >> group,
> >> > > > page, record/value
> >> > > >
> >> > > >            - late/lazy materialization
> >> > > >
> >> > > >                - for columns not involved in a filter, avoid
> >> > > materializing
> >> > > > them entirely
> >> > > >
> >> > > >                  until they are know to be needed after
> evaluating a
> >> > > > filter on other columns
> >> > > >
> >> > > >    - Drill
> >> > > >
> >> > > >        - the Drill engine uses an in-memory vectorized
> >> representation
> >> > of
> >> > > > records
> >> > > >
> >> > > >        - for scalar and repeated types we have implemented a fast
> >> > > > vectorized reader
> >> > > >
> >> > > >          that is optimized to transform between Parquet's on disk
> >> and
> >> > our
> >> > > > in-memory format
> >> > > >
> >> > > >        - this is currently producing performant table scans, but
> >> has no
> >> > > > facility for filter
> >> > > >
> >> > > >          push down
> >> > > >
> >> > > >        - Major goals going forward
> >> > > >
> >> > > >            - filter pushdown
> >> > > >
> >> > > >                - decide the best implementation for incorporating
> >> > filter
> >> > > > pushdown into
> >> > > >
> >> > > >                  our current implementation, or figure out a way
> to
> >> > > > leverage existing
> >> > > >
> >> > > >                  work in the parquet-mr library to accomplish this
> >> goal
> >> > > >
> >> > > >            - late/lazy materialization
> >> > > >
> >> > > >                - see above
> >> > > >
> >> > > >            - contribute existing code back to parquet
> >> > > >
> >> > > >                - the Drill parquet reader has a very strong
> >> emphasis on
> >> > > > performance, a
> >> > > >
> >> > > >                  clear interface to consume, that is sufficiently
> >> > > > separated from Drill
> >> > > >
> >> > > >                  could prove very useful for other projects
> >> > > >
> >> > > >    - First steps
> >> > > >
> >> > > >        - Netflix team will share some of their thoughts and
> research
> >> > from
> >> > > > working with
> >> > > >
> >> > > >          the DWRF code
> >> > > >
> >> > > >            - we can have a discussion based off of this, which
> >> aspects
> >> > > are
> >> > > > done well,
> >> > > >
> >> > > >              and any opportunities they may have missed that we
> can
> >> > > > incorporate into our
> >> > > >
> >> > > >              design
> >> > > >
> >> > > >            - do further investigation and ask the existing
> community
> >> > for
> >> > > > guidance on existing
> >> > > >
> >> > > >              parquet-mr features or planned APIs that may provide
> >> > desired
> >> > > > functionality
> >> > > >
> >> > > >        - We will begin a discussion of an API for the new
> >> functionality
> >> > > >
> >> > > >            - some outstanding thoughts for down the road
> >> > > >
> >> > > >                - The Drill team has an interest in very late
> >> > > > materialization for data stored
> >> > > >
> >> > > >                  in dictionary encoded pages, such as running a
> >> join or
> >> > > > filter on the dictionary
> >> > > >
> >> > > >                  and then going back to the reader to grab all of
> >> the
> >> > > > values in the data that match
> >> > > >
> >> > > >                  the needed members of the dictionary
> >> > > >
> >> > > >                    - this is a later consideration, but just some
> of
> >> > the
> >> > > > idea of the reason we are
> >> > > >
> >> > > >                      opening up the design discussion early so
> that
> >> the
> >> > > > API can be flexible enough
> >> > > >                      to allow this in the further, even if not
> >> > > implemented
> >> > > > too soon
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: High performance vectorized reader meeting notes

Reply via email to