Thanks Jacques. Here is the gist: https://gist.github.com/zhenxiao/2728ce4fe0a7be2d3b30
Comments and Suggestions are appreciated. Thanks, Zhenxiao On Mon, Oct 27, 2014 at 10:55 PM, Jacques Nadeau <[email protected]> wrote: > You can't send attachments. Can you post as google doc or gist? > > On Mon, Oct 27, 2014 at 7:41 PM, Zhenxiao Luo <[email protected]> > wrote: > > > > > Thanks Brock and Jason. > > > > I just drafted a proposed APIs for vectorized Parquet reader(attached in > > this email). Any comments and suggestions are appreciated. > > > > Thanks, > > Zhenxiao > > > > On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <[email protected]> wrote: > > > >> Hi, > >> > >> The Hive + Parquet community is very interested in improving performance > >> of > >> Hive + Parquet and Parquet generally. We are very interested in > >> contributing to the Parquet vectorization and lazy materialization > effort. > >> Please add myself to any future meetings on this topic. > >> > >> BTW here it the JIRA tracking this effort from the Hive side: > >> https://issues.apache.org/jira/browse/HIVE-8120 > >> > >> Brock > >> > >> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <[email protected]> > >> wrote: > >> > >> > Thanks Jason. > >> > > >> > Yes, Netflix is using Presto and Parquet for our BigDataPlatform( > >> > > >> > > >> > http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html > >> > ). > >> > > >> > The fastest format currently in Presto is ORC, not DWRF(Parquet is > fast, > >> > but not as fast as ORC). We are referring to ORC, not facebook's DWRF > >> > implementation. > >> > > >> > We already get Parquet working in Presto. We definitely would like to > >> get > >> > it as fast as ORC. > >> > > >> > Facebook has did native support for ORC in Presto, which does not use > >> the > >> > ORCRecordReader at all. They parses the ORC footer, and does Predicate > >> > Pushdown by skipping row groups, Vectorization by introducing Type > >> Specific > >> > Vectors, and Lazy Materialization by introducing LazyVectors(their > code > >> has > >> > not been committed yet, I mean their pull request). We are planning to > >> do > >> > similar optimization for Parquet in Presto. > >> > > >> > For the ParquetRecordReader, we need additional APIs to read the next > >> Batch > >> > of values, and read in a vector of values. For example, here are the > >> > related APIs in the ORC code: > >> > > >> > /** > >> > * Read the next row batch. The size of the batch to read cannot be > >> > controlled > >> > * by the callers. Caller need to look at VectorizedRowBatch.size of > >> the > >> > retunred > >> > * object to know the batch size read. > >> > * @param previousBatch a row batch object that can be reused by the > >> > reader > >> > * @return the row batch that was read > >> > * @throws java.io.IOException > >> > */ > >> > VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch) > throws > >> > IOException; > >> > > >> > And, here are the related APIs in Presto code, which is used for ORC > >> > support in Presto: > >> > > >> > public void readVector(int columnIndex, Object vector); > >> > > >> > For lazy materialization, we may also consider adding LazyVectors or > >> > LazyBlocks, so that the value is not materialized until they are > >> accessed > >> > by the Operator. > >> > > >> > Any comments and suggestions are appreciated. > >> > > >> > Thanks, > >> > Zhenxiao > >> > > >> > > >> > On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse < > >> [email protected]> > >> > wrote: > >> > > >> > > Hello All, > >> > > > >> > > No updates from me yet, just sending out another message for some of > >> the > >> > > Netflix engineers that were still just subscribed to the google > group > >> > mail. > >> > > This will allow them to respond directly with their research on the > >> > > optimized ORC reader for consideration in the design discussion. > >> > > > >> > > -Jason > >> > > > >> > > On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse < > >> > [email protected] > >> > > > > >> > > wrote: > >> > > > >> > > > Hello Parquet team, > >> > > > > >> > > > I wanted to report the results of a discussion between the Drill > >> team > >> > and > >> > > > the engineers at Netflix working to make Parquet run faster with > >> > Presto. > >> > > > As we have said in the last few hangouts we both want to make > >> > > contributions > >> > > > back to parquet-mr to add features and performance. We thought it > >> would > >> > > be > >> > > > good to sit down and speak directly about our real goals and the > >> best > >> > > next > >> > > > steps to get an engineering effort started to accomplish these > >> goals. > >> > > > > >> > > > Below is a summary of the meeting. > >> > > > > >> > > > - Meeting notes > >> > > > > >> > > > - Attendees: > >> > > > > >> > > > - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo > >> > > > > >> > > > - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse, > Parth > >> > > Chandra > >> > > > > >> > > > - Minutes > >> > > > > >> > > > - Introductions/ Background > >> > > > > >> > > > - Netflix > >> > > > > >> > > > - Working on providing interactive SQL querying to users > >> > > > > >> > > > - have chosen Presto as the query engine and Parquet as > high > >> > > > performance data > >> > > > > >> > > > storage format > >> > > > > >> > > > - Presto is providing needed speed in some cases, but > others > >> are > >> > > > missing optimizations > >> > > > > >> > > > that could be avoiding reads > >> > > > > >> > > > - Have already started some development and investigation, > >> have > >> > > > identified key goals > >> > > > > >> > > > - Some initial benchmarks with a modified ORC reader DWRF, > >> > written > >> > > > by the Presto > >> > > > > >> > > > team shows that such gains are possible with a different > >> > reader > >> > > > implementation > >> > > > > >> > > > - goals > >> > > > > >> > > > - filter pushdown > >> > > > > >> > > > - skipping reads based on filter evaluation on one > or > >> > more > >> > > > columns > >> > > > > >> > > > - this can happen at several granularities : row > >> group, > >> > > > page, record/value > >> > > > > >> > > > - late/lazy materialization > >> > > > > >> > > > - for columns not involved in a filter, avoid > >> > > materializing > >> > > > them entirely > >> > > > > >> > > > until they are know to be needed after > evaluating a > >> > > > filter on other columns > >> > > > > >> > > > - Drill > >> > > > > >> > > > - the Drill engine uses an in-memory vectorized > >> representation > >> > of > >> > > > records > >> > > > > >> > > > - for scalar and repeated types we have implemented a fast > >> > > > vectorized reader > >> > > > > >> > > > that is optimized to transform between Parquet's on disk > >> and > >> > our > >> > > > in-memory format > >> > > > > >> > > > - this is currently producing performant table scans, but > >> has no > >> > > > facility for filter > >> > > > > >> > > > push down > >> > > > > >> > > > - Major goals going forward > >> > > > > >> > > > - filter pushdown > >> > > > > >> > > > - decide the best implementation for incorporating > >> > filter > >> > > > pushdown into > >> > > > > >> > > > our current implementation, or figure out a way > to > >> > > > leverage existing > >> > > > > >> > > > work in the parquet-mr library to accomplish this > >> goal > >> > > > > >> > > > - late/lazy materialization > >> > > > > >> > > > - see above > >> > > > > >> > > > - contribute existing code back to parquet > >> > > > > >> > > > - the Drill parquet reader has a very strong > >> emphasis on > >> > > > performance, a > >> > > > > >> > > > clear interface to consume, that is sufficiently > >> > > > separated from Drill > >> > > > > >> > > > could prove very useful for other projects > >> > > > > >> > > > - First steps > >> > > > > >> > > > - Netflix team will share some of their thoughts and > research > >> > from > >> > > > working with > >> > > > > >> > > > the DWRF code > >> > > > > >> > > > - we can have a discussion based off of this, which > >> aspects > >> > > are > >> > > > done well, > >> > > > > >> > > > and any opportunities they may have missed that we > can > >> > > > incorporate into our > >> > > > > >> > > > design > >> > > > > >> > > > - do further investigation and ask the existing > community > >> > for > >> > > > guidance on existing > >> > > > > >> > > > parquet-mr features or planned APIs that may provide > >> > desired > >> > > > functionality > >> > > > > >> > > > - We will begin a discussion of an API for the new > >> functionality > >> > > > > >> > > > - some outstanding thoughts for down the road > >> > > > > >> > > > - The Drill team has an interest in very late > >> > > > materialization for data stored > >> > > > > >> > > > in dictionary encoded pages, such as running a > >> join or > >> > > > filter on the dictionary > >> > > > > >> > > > and then going back to the reader to grab all of > >> the > >> > > > values in the data that match > >> > > > > >> > > > the needed members of the dictionary > >> > > > > >> > > > - this is a later consideration, but just some > of > >> > the > >> > > > idea of the reason we are > >> > > > > >> > > > opening up the design discussion early so > that > >> the > >> > > > API can be flexible enough > >> > > > to allow this in the further, even if not > >> > > implemented > >> > > > too soon > >> > > > > >> > > > >> > > >> > > > > >
