Re: High performance vectorized reader meeting notes

Brock Noland Sat, 01 Nov 2014 13:04:41 -0700

Hi,

Great! I will take a look soon.


Cheers!
Brock

On Mon, Oct 27, 2014 at 11:18 PM, Zhenxiao Luo <[email protected]> wrote:

>
> Thanks Jacques.
>
> Here is the gist:
> https://gist.github.com/zhenxiao/2728ce4fe0a7be2d3b30
>
> Comments and Suggestions are appreciated.
>
> Thanks,
> Zhenxiao
>
> On Mon, Oct 27, 2014 at 10:55 PM, Jacques Nadeau <[email protected]>
> wrote:
>
>> You can't send attachments.  Can you post as google doc or gist?
>>
>> On Mon, Oct 27, 2014 at 7:41 PM, Zhenxiao Luo <[email protected]>
>> wrote:
>>
>> >
>> > Thanks Brock and Jason.
>> >
>> > I just drafted a proposed APIs for vectorized Parquet reader(attached in
>> > this email). Any comments and suggestions are appreciated.
>> >
>> > Thanks,
>> > Zhenxiao
>> >
>> > On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <[email protected]>
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> The Hive + Parquet community is very interested in improving
>> performance
>> >> of
>> >> Hive + Parquet and Parquet generally. We are very interested in
>> >> contributing to the Parquet vectorization and lazy materialization
>> effort.
>> >> Please add myself to any future meetings on this topic.
>> >>
>> >> BTW here it the JIRA tracking this effort from the Hive side:
>> >> https://issues.apache.org/jira/browse/HIVE-8120
>> >>
>> >> Brock
>> >>
>> >> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <[email protected]
>> >
>> >> wrote:
>> >>
>> >> > Thanks Jason.
>> >> >
>> >> > Yes, Netflix is using Presto and Parquet for our BigDataPlatform(
>> >> >
>> >> >
>> >>
>> http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html
>> >> > ).
>> >> >
>> >> > The fastest format currently in Presto is ORC, not DWRF(Parquet is
>> fast,
>> >> > but not as fast as ORC). We are referring to ORC, not facebook's DWRF
>> >> > implementation.
>> >> >
>> >> > We already get Parquet working in Presto. We definitely would like to
>> >> get
>> >> > it as fast as ORC.
>> >> >
>> >> > Facebook has did native support for ORC in Presto, which does not use
>> >> the
>> >> > ORCRecordReader at all. They parses the ORC footer, and does
>> Predicate
>> >> > Pushdown by skipping row groups, Vectorization by introducing Type
>> >> Specific
>> >> > Vectors, and Lazy Materialization by introducing LazyVectors(their
>> code
>> >> has
>> >> > not been committed yet, I mean their pull request). We are planning
>> to
>> >> do
>> >> > similar optimization for Parquet in Presto.
>> >> >
>> >> > For the ParquetRecordReader, we need additional APIs to read the next
>> >> Batch
>> >> > of values, and read in a vector of values. For example, here are the
>> >> > related APIs in the ORC code:
>> >> >
>> >> > /**
>> >> >    * Read the next row batch. The size of the batch to read cannot be
>> >> > controlled
>> >> >    * by the callers. Caller need to look at VectorizedRowBatch.size
>> of
>> >> the
>> >> > retunred
>> >> >    * object to know the batch size read.
>> >> >    * @param previousBatch a row batch object that can be reused by
>> the
>> >> > reader
>> >> >    * @return the row batch that was read
>> >> >    * @throws java.io.IOException
>> >> >    */
>> >> >   VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch)
>> throws
>> >> > IOException;
>> >> >
>> >> > And, here are the related APIs in Presto code, which is used for ORC
>> >> > support in Presto:
>> >> >
>> >> > public void readVector(int columnIndex, Object vector);
>> >> >
>> >> > For lazy materialization, we may also consider adding LazyVectors or
>> >> > LazyBlocks, so that the value is not materialized until they are
>> >> accessed
>> >> > by the Operator.
>> >> >
>> >> > Any comments and suggestions are appreciated.
>> >> >
>> >> > Thanks,
>> >> > Zhenxiao
>> >> >
>> >> >
>> >> > On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <
>> >> [email protected]>
>> >> > wrote:
>> >> >
>> >> > > Hello All,
>> >> > >
>> >> > > No updates from me yet, just sending out another message for some
>> of
>> >> the
>> >> > > Netflix engineers that were still just subscribed to the google
>> group
>> >> > mail.
>> >> > > This will allow them to respond directly with their research on the
>> >> > > optimized ORC reader for consideration in the design discussion.
>> >> > >
>> >> > > -Jason
>> >> > >
>> >> > > On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <
>> >> > [email protected]
>> >> > > >
>> >> > > wrote:
>> >> > >
>> >> > > > Hello Parquet team,
>> >> > > >
>> >> > > > I wanted to report the results of a discussion between the Drill
>> >> team
>> >> > and
>> >> > > > the engineers  at Netflix working to make Parquet run faster with
>> >> > Presto.
>> >> > > > As we have said in the last few hangouts we both want to make
>> >> > > contributions
>> >> > > > back to parquet-mr to add features and performance. We thought it
>> >> would
>> >> > > be
>> >> > > > good to sit down and speak directly about our real goals and the
>> >> best
>> >> > > next
>> >> > > > steps to get an engineering effort started to accomplish these
>> >> goals.
>> >> > > >
>> >> > > > Below is a summary of the meeting.
>> >> > > >
>> >> > > > - Meeting notes
>> >> > > >
>> >> > > >    - Attendees:
>> >> > > >
>> >> > > >        - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
>> >> > > >
>> >> > > >        - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse,
>> Parth
>> >> > > Chandra
>> >> > > >
>> >> > > > - Minutes
>> >> > > >
>> >> > > >    - Introductions/ Background
>> >> > > >
>> >> > > >    - Netflix
>> >> > > >
>> >> > > >        - Working on providing interactive SQL querying to users
>> >> > > >
>> >> > > >        - have chosen Presto as the query engine and Parquet as
>> high
>> >> > > > performance data
>> >> > > >
>> >> > > >          storage format
>> >> > > >
>> >> > > >        - Presto is providing needed speed in some cases, but
>> others
>> >> are
>> >> > > > missing optimizations
>> >> > > >
>> >> > > >          that could be avoiding reads
>> >> > > >
>> >> > > >        - Have already started some development and investigation,
>> >> have
>> >> > > > identified key goals
>> >> > > >
>> >> > > >        - Some initial benchmarks with a modified ORC reader DWRF,
>> >> > written
>> >> > > > by the Presto
>> >> > > >
>> >> > > >          team shows that such gains are possible with a different
>> >> > reader
>> >> > > > implementation
>> >> > > >
>> >> > > >        - goals
>> >> > > >
>> >> > > >            - filter pushdown
>> >> > > >
>> >> > > >                - skipping reads based on filter evaluation on
>> one or
>> >> > more
>> >> > > > columns
>> >> > > >
>> >> > > >                - this can happen at several granularities : row
>> >> group,
>> >> > > > page, record/value
>> >> > > >
>> >> > > >            - late/lazy materialization
>> >> > > >
>> >> > > >                - for columns not involved in a filter, avoid
>> >> > > materializing
>> >> > > > them entirely
>> >> > > >
>> >> > > >                  until they are know to be needed after
>> evaluating a
>> >> > > > filter on other columns
>> >> > > >
>> >> > > >    - Drill
>> >> > > >
>> >> > > >        - the Drill engine uses an in-memory vectorized
>> >> representation
>> >> > of
>> >> > > > records
>> >> > > >
>> >> > > >        - for scalar and repeated types we have implemented a fast
>> >> > > > vectorized reader
>> >> > > >
>> >> > > >          that is optimized to transform between Parquet's on disk
>> >> and
>> >> > our
>> >> > > > in-memory format
>> >> > > >
>> >> > > >        - this is currently producing performant table scans, but
>> >> has no
>> >> > > > facility for filter
>> >> > > >
>> >> > > >          push down
>> >> > > >
>> >> > > >        - Major goals going forward
>> >> > > >
>> >> > > >            - filter pushdown
>> >> > > >
>> >> > > >                - decide the best implementation for incorporating
>> >> > filter
>> >> > > > pushdown into
>> >> > > >
>> >> > > >                  our current implementation, or figure out a way
>> to
>> >> > > > leverage existing
>> >> > > >
>> >> > > >                  work in the parquet-mr library to accomplish
>> this
>> >> goal
>> >> > > >
>> >> > > >            - late/lazy materialization
>> >> > > >
>> >> > > >                - see above
>> >> > > >
>> >> > > >            - contribute existing code back to parquet
>> >> > > >
>> >> > > >                - the Drill parquet reader has a very strong
>> >> emphasis on
>> >> > > > performance, a
>> >> > > >
>> >> > > >                  clear interface to consume, that is sufficiently
>> >> > > > separated from Drill
>> >> > > >
>> >> > > >                  could prove very useful for other projects
>> >> > > >
>> >> > > >    - First steps
>> >> > > >
>> >> > > >        - Netflix team will share some of their thoughts and
>> research
>> >> > from
>> >> > > > working with
>> >> > > >
>> >> > > >          the DWRF code
>> >> > > >
>> >> > > >            - we can have a discussion based off of this, which
>> >> aspects
>> >> > > are
>> >> > > > done well,
>> >> > > >
>> >> > > >              and any opportunities they may have missed that we
>> can
>> >> > > > incorporate into our
>> >> > > >
>> >> > > >              design
>> >> > > >
>> >> > > >            - do further investigation and ask the existing
>> community
>> >> > for
>> >> > > > guidance on existing
>> >> > > >
>> >> > > >              parquet-mr features or planned APIs that may provide
>> >> > desired
>> >> > > > functionality
>> >> > > >
>> >> > > >        - We will begin a discussion of an API for the new
>> >> functionality
>> >> > > >
>> >> > > >            - some outstanding thoughts for down the road
>> >> > > >
>> >> > > >                - The Drill team has an interest in very late
>> >> > > > materialization for data stored
>> >> > > >
>> >> > > >                  in dictionary encoded pages, such as running a
>> >> join or
>> >> > > > filter on the dictionary
>> >> > > >
>> >> > > >                  and then going back to the reader to grab all of
>> >> the
>> >> > > > values in the data that match
>> >> > > >
>> >> > > >                  the needed members of the dictionary
>> >> > > >
>> >> > > >                    - this is a later consideration, but just
>> some of
>> >> > the
>> >> > > > idea of the reason we are
>> >> > > >
>> >> > > >                      opening up the design discussion early so
>> that
>> >> the
>> >> > > > API can be flexible enough
>> >> > > >                      to allow this in the further, even if not
>> >> > > implemented
>> >> > > > too soon
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: High performance vectorized reader meeting notes

Reply via email to