Wednesday then? no more conflicts? On Tue, Jul 21, 2015 at 7:26 PM, Alex Levenson < [email protected]> wrote:
> Sorry to be difficult but, can I request any day other than Monday -- how > about Wednesday? > > On Tue, Jul 21, 2015 at 7:19 PM, Julien Le Dem <[email protected]> wrote: > > > There's no particular reason for Tuesdays. > > We could do the next one on a Monday. > > Anybody objects? > > > > Julien > > > > > On Jul 21, 2015, at 17:37, Jacques Nadeau <[email protected]> wrote: > > > > > > Any chance we can have these on either a different day or time? The > > Drill > > > hangout is every Tuesday at 10am so I always have to pick one or the > > other. > > > > > > On Tue, Jul 21, 2015 at 10:56 AM, Nezih Yigitbasi < > > > [email protected]> wrote: > > > > > >> An update to "actions", I will create a PR for the vectorized read > > instead > > >> of Zhenxiao. > > >> > > >> Thanks, > > >> Nezih > > >> > > >> On Tue, Jul 21, 2015 at 10:51 AM, Julien Le Dem > > <[email protected] > > >> wrote: > > >> > > >>> Agenda > > >>> - Julien (Twitter): > > >>> - interested in ByteBuffer status > > >>> - Ryan (by email): interested in ByteBuffer status. did some work on > > >> bloom > > >>> filters. > > >>> PARQUET-251 and PARQUET-246 make sure 2.0 encodings and other new > > >> features > > >>> are solid. > > >>> - Daniel, Nezih, Zhengxiao (Netflix): > > >>> - update on Vectorized read path for Presto (Dong Chen for Hive) > > >>> - Parquet-99: OOM on write > > >>> - Ippokratis: Impala team. > > >>> - Jason Altekruse: (Drill/MapR) > > >>> - update on Java direct memory representation (hadoop 2.0 > ByteBuffer) > > >>> - currently uses a fork of Parquet that uses the GSOC work. > > >>> - Tianshuo: 1.8.1 release. > > >>> - Sanjeev (Twitter): > > >>> - want to hear updates about vectorized in Presto > > >>> > > >>> actions: > > >>> - Zhengxiao: update vectorization PR > > >>> - Jason: update ByteBuffer PR > > >>> - Jason: open JIRA for dic encoding fallback pointer > > >>> - Daniel: opened a PR for PARQUET-99: up for review > > >>> > > >>> Notes: > > >>> - Vectorized read path for Presto (Dong Chen for Hive) PARQUET-131 > > >>> - batch read > > >>> - lazy materialization > > >>> - Netflix integrated with Presto, Dong Chen integrated with > Hive > > >>> - Nezih: micro/macro benchmark > > >>> - micro 2 read paths > > >>> - only primitives, no converters (3 x faster with > > >>> vectorized) > > >>> - complex with converters (no different performance) > > >>> - macro Presto : > > >>> - complex types not better > > >>> - 2x better for primitive types > > >>> - Daniel: projection + predicate well optimized with presto > (lazy > > >>> load, lazy materialization). predicate push down and using dic in > > >> predicate > > >>> evaluation. > > >>> - Ippokratis: fan out? => 100 values per collection, list/map > > >>> materialization expansive > > >>> > > >>> - Dictionary encoding: because of fallback mechanism. We don't know > > when > > >>> the dictionary ends. => Jason to open a JIRA > > >>> > > >>> - Parquet-99: OOM on write > > >>> - all big rows: (10MB per row) runs OOM before we first check > > >>> - big variability in size: small initial rows throw off estimate > and > > >>> following big rows blow memory > > >>> - add settings for checking at constant #rows. > > >>> - we should experiment with simpler strategies > > >>> > > >>> - ByteBuffer status: > > >>> - Jason need to rebase the PR > > >>> - Parquet-77 > > >>> > > >>> > > >>> On Tue, Jul 21, 2015 at 10:05 AM, Julien Le Dem <[email protected]> > > >>> wrote: > > >>> > > >>>> It's happening now: > > >>>> https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up > > >>>> > > >>>> On Tue, Jul 14, 2015 at 10:04 AM, Julien Le Dem <[email protected] > > > > >>>> wrote: > > >>>> > > >>>>> The next Parquet sync up will be held on google hangout on > 7/21/2015 > > >> at > > >>>>> 10 am PST > > >>>>> https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up > > >> > > > > > > -- > Alex Levenson > @THISWILLWORK >
