The next sync up will be around Strata London early June, where I'll happen to be. We will do in the morning Pacific time, evening Europe time.
Notes from this sync: attendees: - Julien (Dremio) - Alex, Piyush (Twitter) - Ryan (Netflix) Parquet 2.0 encodings discussion: - Jira open to finalize encodings: PARQUET-588: 2.0 encodings finalization. - Ryan is doing experiments to measure efficiency on their data - Alex and Piyush are looking at encoding selection strategies: How to pick the best encoding for the data automatically 1.9 release: - last blocker: PARQUET-400 (readFully() behavior) needs update from Jason. Possibly Piyush could pick it up if Jason is busy Brotli integration. - Ryan has been working on Brotli compression algorithm integration - for similar compression cost as snappy, much better compression ratio - embeds native library similar to snappy integration - looking into possibly statically linking the native library - PR available on parquet-format and parquet-mr Vectorized read: - towards end of June we will organize a Parquet vectorized read hackathon for all parties interested (make yourself known if interested, we'll send more details later, possible remote participation through hangout) Lazy projections at runtime. - Alex has been looking into lazy thrift object for parquet-thrift to minimize assembly cost in scalding existing jobs that don't declare the columns they need. Next sync will be in the morning PT. On Thu, May 12, 2016 at 5:42 AM, Deepak Majeti <[email protected]> wrote: > I am sorry for missing this meeting as well. > My interest is also to improve parquet-cpp reader/writer performance. > I will work with Uwe and Wes on this. > My other interest is on supporting predicate pushdown. I will work on > this in parallel with performance. > > Thanks! > > On Thu, May 12, 2016 at 4:05 AM, Uwe Korn <[email protected]> wrote: > > > >> I'm sorry I wasn't able to join today again (traveling). We could > >> choose an early time Pacific time to make the meeting accessible to > >> both Asia and Europe -- I would suggest 8 or 9 AM Pacific > >> > > 8 or 9 am PT would work for me (CEST), 4pm PT is just not manageable. > > Also: Do we have a calendar where I can see in advance when sync ups are? > > > > Currently I'm working on the Parquet integration with Arrow and on > building > > a Python interface for libarrow-parquet. Once we have a basic working > > version, I will look into implementing missing features in the writer and > > improving general read/write performance in parquet-cpp. > > > > Uwe > > > >> > >> http://timesched.pocoo.org/?date=2016-05-11&tz=pacific-standard-time > !,de:berlin,cn:shanghai,us:new-york-city:ny > >> > >> I did not have much time for writing Parquet C++ development the last > >> 6 weeks, but plan to help Uwe complete the writer implementation and > >> work toward a more complete Apache Arrow integration (this is in > >> progress here: > >> https://github.com/apache/arrow/tree/master/cpp/src/arrow/parquet) > >> > >> Other items of immediate interest > >> > >> - C++ API to the file metadata (read + write) > >> - Conda packaging for built artifacts (to make parquet-cpp easier for > >> Python programmers to install portably when the time comes). I got > >> Thrift C++ into conda-forge this week so this should not be hard now > >> https://github.com/conda-forge/thrift-cpp-feedstock > >> - Expanding column scan benchmarks (thanks Uwe for kickstarting the > >> benchmarking effort!) > >> - Perf improvements for the RLE decoder > >> > >> Thanks > >> Wes > >> > >> On Wed, May 11, 2016 at 4:04 PM, Julien Le Dem <[email protected]> > wrote: > >>> > >>> The actual hangout url is > >>> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up > >>> > >>> On Wed, May 11, 2016 at 3:57 PM, Julien Le Dem <[email protected]> > wrote: > >>> > >>>> starting in 5 mins: > >>>> https://plus.google.com/hangouts/_/event/parquet_sync_up > >>>> > >>>> On Wed, May 11, 2016 at 1:53 PM, Julien Le Dem <[email protected]> > >>>> wrote: > >>>> > >>>>> It is happening at 4pm PT on google hangout > >>>>> https://plus.google.com/hangouts/_/event/parquet_sync_up > >>>>> > >>>>> (we can do a different time next time, based on timezone preferences. > >>>>> Afternoon is better for Asia. Morning is better for Europe) > >>>>> > >>>>> -- > >>>>> Julien > >>>>> > >>>> > >>>> > >>>> -- > >>>> Julien > >>>> > >>> > >>> > >>> -- > >>> Julien > > > > > > > > -- > regards, > Deepak Majeti > -- Julien
