Wes: I maintain a google calendar invite. People can send me their email address to be notified of the sync. Otherwise I send reminders on the dev list, but it looks like last time I missed sending an earlier reminder.
Cheng: On the Parquet side for vectorization, you can always bypass the assembly and access the column readers directly. Nezih/Ryan/Dan have some work done around this with Presto. Other projects Like Drill or Spark have a custom reader based on the column readers. We're discussing making a shared implementation in Parquet itself. On Mon, May 16, 2016 at 12:32 AM, Xu, Cheng A <[email protected]> wrote: > Hi, > Looks the vectorization is still undergoing. And I'd like to support Hive > vectorization for parquet. Is there any early vectorization feature ready > version of Parquet I could use to continue the work in Hive side? Thank you > in advance. > > -----Original Message----- > From: Julien Le Dem [mailto:[email protected]] > Sent: Friday, May 13, 2016 8:34 AM > To: [email protected] > Subject: Re: Parquet sync uo > > The next sync up will be around Strata London early June, where I'll > happen to be. We will do in the morning Pacific time, evening Europe time. > > Notes from this sync: > > attendees: > - Julien (Dremio) > - Alex, Piyush (Twitter) > - Ryan (Netflix) > > > Parquet 2.0 encodings discussion: > > - Jira open to finalize encodings: PARQUET-588: 2.0 encodings > finalization. > > - Ryan is doing experiments to measure efficiency on their data > > - Alex and Piyush are looking at encoding selection strategies: How to > pick the best encoding for the data automatically > > > 1.9 release: > > - last blocker: PARQUET-400 (readFully() behavior) needs update from > Jason. Possibly Piyush could pick it up if Jason is busy > > > Brotli integration. > > - Ryan has been working on Brotli compression algorithm integration > > - for similar compression cost as snappy, much better compression ratio > > - embeds native library similar to snappy integration > > - looking into possibly statically linking the native library > > - PR available on parquet-format and parquet-mr > > > Vectorized read: > > - towards end of June we will organize a Parquet vectorized read > hackathon for all parties interested (make yourself known if interested, > we'll send more details later, possible remote participation through > hangout) > > > Lazy projections at runtime. > > - Alex has been looking into lazy thrift object for parquet-thrift to > minimize assembly cost in scalding existing jobs that don't declare the > columns they need. > > > Next sync will be in the morning PT. > > > > > > > > On Thu, May 12, 2016 at 5:42 AM, Deepak Majeti <[email protected]> > wrote: > > > I am sorry for missing this meeting as well. > > My interest is also to improve parquet-cpp reader/writer performance. > > I will work with Uwe and Wes on this. > > My other interest is on supporting predicate pushdown. I will work on > > this in parallel with performance. > > > > Thanks! > > > > On Thu, May 12, 2016 at 4:05 AM, Uwe Korn <[email protected]> wrote: > > > > > >> I'm sorry I wasn't able to join today again (traveling). We could > > >> choose an early time Pacific time to make the meeting accessible to > > >> both Asia and Europe -- I would suggest 8 or 9 AM Pacific > > >> > > > 8 or 9 am PT would work for me (CEST), 4pm PT is just not manageable. > > > Also: Do we have a calendar where I can see in advance when sync ups > are? > > > > > > Currently I'm working on the Parquet integration with Arrow and on > > building > > > a Python interface for libarrow-parquet. Once we have a basic > > > working version, I will look into implementing missing features in > > > the writer and improving general read/write performance in parquet-cpp. > > > > > > Uwe > > > > > >> > > >> http://timesched.pocoo.org/?date=2016-05-11&tz=pacific-standard-tim > > >> e > > !,de:berlin,cn:shanghai,us:new-york-city:ny > > >> > > >> I did not have much time for writing Parquet C++ development the > > >> last > > >> 6 weeks, but plan to help Uwe complete the writer implementation > > >> and work toward a more complete Apache Arrow integration (this is > > >> in progress here: > > >> https://github.com/apache/arrow/tree/master/cpp/src/arrow/parquet) > > >> > > >> Other items of immediate interest > > >> > > >> - C++ API to the file metadata (read + write) > > >> - Conda packaging for built artifacts (to make parquet-cpp easier > > >> for Python programmers to install portably when the time comes). I > > >> got Thrift C++ into conda-forge this week so this should not be > > >> hard now https://github.com/conda-forge/thrift-cpp-feedstock > > >> - Expanding column scan benchmarks (thanks Uwe for kickstarting the > > >> benchmarking effort!) > > >> - Perf improvements for the RLE decoder > > >> > > >> Thanks > > >> Wes > > >> > > >> On Wed, May 11, 2016 at 4:04 PM, Julien Le Dem <[email protected]> > > wrote: > > >>> > > >>> The actual hangout url is > > >>> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up > > >>> > > >>> On Wed, May 11, 2016 at 3:57 PM, Julien Le Dem <[email protected]> > > wrote: > > >>> > > >>>> starting in 5 mins: > > >>>> https://plus.google.com/hangouts/_/event/parquet_sync_up > > >>>> > > >>>> On Wed, May 11, 2016 at 1:53 PM, Julien Le Dem > > >>>> <[email protected]> > > >>>> wrote: > > >>>> > > >>>>> It is happening at 4pm PT on google hangout > > >>>>> https://plus.google.com/hangouts/_/event/parquet_sync_up > > >>>>> > > >>>>> (we can do a different time next time, based on timezone > preferences. > > >>>>> Afternoon is better for Asia. Morning is better for Europe) > > >>>>> > > >>>>> -- > > >>>>> Julien > > >>>>> > > >>>> > > >>>> > > >>>> -- > > >>>> Julien > > >>>> > > >>> > > >>> > > >>> -- > > >>> Julien > > > > > > > > > > > > > > -- > > regards, > > Deepak Majeti > > > > > > -- > Julien > -- Julien
