There's no particular reason for Tuesdays. We could do the next one on a Monday. Anybody objects?
Julien > On Jul 21, 2015, at 17:37, Jacques Nadeau <[email protected]> wrote: > > Any chance we can have these on either a different day or time? The Drill > hangout is every Tuesday at 10am so I always have to pick one or the other. > > On Tue, Jul 21, 2015 at 10:56 AM, Nezih Yigitbasi < > [email protected]> wrote: > >> An update to "actions", I will create a PR for the vectorized read instead >> of Zhenxiao. >> >> Thanks, >> Nezih >> >> On Tue, Jul 21, 2015 at 10:51 AM, Julien Le Dem <[email protected] >> wrote: >> >>> Agenda >>> - Julien (Twitter): >>> - interested in ByteBuffer status >>> - Ryan (by email): interested in ByteBuffer status. did some work on >> bloom >>> filters. >>> PARQUET-251 and PARQUET-246 make sure 2.0 encodings and other new >> features >>> are solid. >>> - Daniel, Nezih, Zhengxiao (Netflix): >>> - update on Vectorized read path for Presto (Dong Chen for Hive) >>> - Parquet-99: OOM on write >>> - Ippokratis: Impala team. >>> - Jason Altekruse: (Drill/MapR) >>> - update on Java direct memory representation (hadoop 2.0 ByteBuffer) >>> - currently uses a fork of Parquet that uses the GSOC work. >>> - Tianshuo: 1.8.1 release. >>> - Sanjeev (Twitter): >>> - want to hear updates about vectorized in Presto >>> >>> actions: >>> - Zhengxiao: update vectorization PR >>> - Jason: update ByteBuffer PR >>> - Jason: open JIRA for dic encoding fallback pointer >>> - Daniel: opened a PR for PARQUET-99: up for review >>> >>> Notes: >>> - Vectorized read path for Presto (Dong Chen for Hive) PARQUET-131 >>> - batch read >>> - lazy materialization >>> - Netflix integrated with Presto, Dong Chen integrated with Hive >>> - Nezih: micro/macro benchmark >>> - micro 2 read paths >>> - only primitives, no converters (3 x faster with >>> vectorized) >>> - complex with converters (no different performance) >>> - macro Presto : >>> - complex types not better >>> - 2x better for primitive types >>> - Daniel: projection + predicate well optimized with presto (lazy >>> load, lazy materialization). predicate push down and using dic in >> predicate >>> evaluation. >>> - Ippokratis: fan out? => 100 values per collection, list/map >>> materialization expansive >>> >>> - Dictionary encoding: because of fallback mechanism. We don't know when >>> the dictionary ends. => Jason to open a JIRA >>> >>> - Parquet-99: OOM on write >>> - all big rows: (10MB per row) runs OOM before we first check >>> - big variability in size: small initial rows throw off estimate and >>> following big rows blow memory >>> - add settings for checking at constant #rows. >>> - we should experiment with simpler strategies >>> >>> - ByteBuffer status: >>> - Jason need to rebase the PR >>> - Parquet-77 >>> >>> >>> On Tue, Jul 21, 2015 at 10:05 AM, Julien Le Dem <[email protected]> >>> wrote: >>> >>>> It's happening now: >>>> https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up >>>> >>>> On Tue, Jul 14, 2015 at 10:04 AM, Julien Le Dem <[email protected]> >>>> wrote: >>>> >>>>> The next Parquet sync up will be held on google hangout on 7/21/2015 >> at >>>>> 10 am PST >>>>> https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up >>
