Hi,
Looks the vectorization is still undergoing. And I'd like to support Hive 
vectorization for parquet. Is there any early vectorization feature ready 
version of Parquet I could use to continue the work in Hive side? Thank you in 
advance.

-----Original Message-----
From: Julien Le Dem [mailto:[email protected]] 
Sent: Friday, May 13, 2016 8:34 AM
To: [email protected]
Subject: Re: Parquet sync uo

The next sync up will be around Strata London early June, where I'll happen to 
be. We will do in the morning Pacific time, evening Europe time.

Notes from this sync:

attendees:
 - Julien (Dremio)
 - Alex, Piyush (Twitter)
 - Ryan (Netflix)


 Parquet 2.0 encodings discussion:

 - Jira open to finalize encodings: PARQUET-588: 2.0 encodings finalization.

 - Ryan is doing experiments to measure efficiency on their data

- Alex and Piyush are looking at encoding selection strategies: How to pick the 
best encoding for the data automatically


1.9 release:

 - last blocker: PARQUET-400 (readFully() behavior) needs update from Jason. 
Possibly Piyush could pick it up if Jason is busy


Brotli integration.

- Ryan has been working on Brotli compression algorithm integration

- for similar compression cost as snappy, much better compression ratio

- embeds native library similar to snappy integration

- looking into possibly statically linking the native library

- PR available on parquet-format and parquet-mr


Vectorized read:

 - towards end of June we will organize a Parquet vectorized read hackathon for 
all parties interested (make yourself known if interested, we'll send more 
details later, possible remote participation through hangout)


Lazy projections at runtime.

 - Alex has been looking into lazy thrift object for parquet-thrift to minimize 
assembly cost in scalding existing jobs that don't declare the columns they 
need.


Next sync will be in the morning PT.







On Thu, May 12, 2016 at 5:42 AM, Deepak Majeti <[email protected]>
wrote:

> I am sorry for missing this meeting as well.
> My interest is also to improve parquet-cpp reader/writer performance.
> I will work with Uwe and Wes on this.
> My other interest is on supporting predicate pushdown.  I will work on 
> this in parallel with performance.
>
> Thanks!
>
> On Thu, May 12, 2016 at 4:05 AM, Uwe Korn <[email protected]> wrote:
> >
> >> I'm sorry I wasn't able to join today again (traveling). We could 
> >> choose an early time Pacific time to make the meeting accessible to 
> >> both Asia and Europe -- I would suggest 8 or 9 AM Pacific
> >>
> > 8 or 9 am PT would work for me (CEST), 4pm PT is just not manageable.
> > Also: Do we have a calendar where I can see in advance when sync ups are?
> >
> > Currently I'm working on the Parquet integration with Arrow and on
> building
> > a Python interface for libarrow-parquet. Once we have a basic 
> > working version, I will look into implementing missing features in 
> > the writer and improving general read/write performance in parquet-cpp.
> >
> > Uwe
> >
> >>
> >> http://timesched.pocoo.org/?date=2016-05-11&tz=pacific-standard-tim
> >> e
> !,de:berlin,cn:shanghai,us:new-york-city:ny
> >>
> >> I did not have much time for writing Parquet C++ development the 
> >> last
> >> 6 weeks, but plan to help Uwe complete the writer implementation 
> >> and work toward a more complete Apache Arrow integration (this is 
> >> in progress here:
> >> https://github.com/apache/arrow/tree/master/cpp/src/arrow/parquet)
> >>
> >> Other items of immediate interest
> >>
> >> - C++ API to the file metadata (read + write)
> >> - Conda packaging for built artifacts (to make parquet-cpp easier 
> >> for Python programmers to install portably when the time comes). I 
> >> got Thrift C++ into conda-forge this week so this should not be 
> >> hard now https://github.com/conda-forge/thrift-cpp-feedstock
> >> - Expanding column scan benchmarks (thanks Uwe for kickstarting the 
> >> benchmarking effort!)
> >> - Perf improvements for the RLE decoder
> >>
> >> Thanks
> >> Wes
> >>
> >> On Wed, May 11, 2016 at 4:04 PM, Julien Le Dem <[email protected]>
> wrote:
> >>>
> >>> The actual hangout url is
> >>> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
> >>>
> >>> On Wed, May 11, 2016 at 3:57 PM, Julien Le Dem <[email protected]>
> wrote:
> >>>
> >>>> starting in 5 mins:
> >>>> https://plus.google.com/hangouts/_/event/parquet_sync_up
> >>>>
> >>>> On Wed, May 11, 2016 at 1:53 PM, Julien Le Dem 
> >>>> <[email protected]>
> >>>> wrote:
> >>>>
> >>>>> It is happening at 4pm PT on google hangout 
> >>>>> https://plus.google.com/hangouts/_/event/parquet_sync_up
> >>>>>
> >>>>> (we can do a different time next time, based on timezone preferences.
> >>>>> Afternoon is better for Asia. Morning is better for Europe)
> >>>>>
> >>>>> --
> >>>>> Julien
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Julien
> >>>>
> >>>
> >>>
> >>> --
> >>> Julien
> >
> >
>
>
>
> --
> regards,
> Deepak Majeti
>



--
Julien

Reply via email to