+1 Wednesday On Wed, Jul 22, 2015 at 4:02 PM, Jason Altekruse <[email protected]> wrote:
> +1 for wednesday > > On Wed, Jul 22, 2015 at 3:47 PM, Jacques Nadeau <[email protected]> > wrote: > > > +1 for Wed. > > > > On Wed, Jul 22, 2015 at 3:45 PM, Alex Levenson < > > [email protected]> wrote: > > > > > +1 for Wednesday > > > > > > On Wed, Jul 22, 2015 at 3:44 PM, Julien Le Dem > > <[email protected] > > > > > > > wrote: > > > > > > > Wednesday then? > > > > no more conflicts? > > > > > > > > On Tue, Jul 21, 2015 at 7:26 PM, Alex Levenson < > > > > [email protected]> wrote: > > > > > > > > > Sorry to be difficult but, can I request any day other than Monday > -- > > > how > > > > > about Wednesday? > > > > > > > > > > On Tue, Jul 21, 2015 at 7:19 PM, Julien Le Dem <[email protected]> > > > wrote: > > > > > > > > > > > There's no particular reason for Tuesdays. > > > > > > We could do the next one on a Monday. > > > > > > Anybody objects? > > > > > > > > > > > > Julien > > > > > > > > > > > > > On Jul 21, 2015, at 17:37, Jacques Nadeau <[email protected]> > > > > wrote: > > > > > > > > > > > > > > Any chance we can have these on either a different day or time? > > > The > > > > > > Drill > > > > > > > hangout is every Tuesday at 10am so I always have to pick one > or > > > the > > > > > > other. > > > > > > > > > > > > > > On Tue, Jul 21, 2015 at 10:56 AM, Nezih Yigitbasi < > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > >> An update to "actions", I will create a PR for the vectorized > > read > > > > > > instead > > > > > > >> of Zhenxiao. > > > > > > >> > > > > > > >> Thanks, > > > > > > >> Nezih > > > > > > >> > > > > > > >> On Tue, Jul 21, 2015 at 10:51 AM, Julien Le Dem > > > > > > <[email protected] > > > > > > >> wrote: > > > > > > >> > > > > > > >>> Agenda > > > > > > >>> - Julien (Twitter): > > > > > > >>> - interested in ByteBuffer status > > > > > > >>> - Ryan (by email): interested in ByteBuffer status. did some > > work > > > > on > > > > > > >> bloom > > > > > > >>> filters. > > > > > > >>> PARQUET-251 and PARQUET-246 make sure 2.0 encodings and other > > new > > > > > > >> features > > > > > > >>> are solid. > > > > > > >>> - Daniel, Nezih, Zhengxiao (Netflix): > > > > > > >>> - update on Vectorized read path for Presto (Dong Chen for > > > Hive) > > > > > > >>> - Parquet-99: OOM on write > > > > > > >>> - Ippokratis: Impala team. > > > > > > >>> - Jason Altekruse: (Drill/MapR) > > > > > > >>> - update on Java direct memory representation (hadoop 2.0 > > > > > ByteBuffer) > > > > > > >>> - currently uses a fork of Parquet that uses the GSOC work. > > > > > > >>> - Tianshuo: 1.8.1 release. > > > > > > >>> - Sanjeev (Twitter): > > > > > > >>> - want to hear updates about vectorized in Presto > > > > > > >>> > > > > > > >>> actions: > > > > > > >>> - Zhengxiao: update vectorization PR > > > > > > >>> - Jason: update ByteBuffer PR > > > > > > >>> - Jason: open JIRA for dic encoding fallback pointer > > > > > > >>> - Daniel: opened a PR for PARQUET-99: up for review > > > > > > >>> > > > > > > >>> Notes: > > > > > > >>> - Vectorized read path for Presto (Dong Chen for Hive) > > > PARQUET-131 > > > > > > >>> - batch read > > > > > > >>> - lazy materialization > > > > > > >>> - Netflix integrated with Presto, Dong Chen integrated > > with > > > > > Hive > > > > > > >>> - Nezih: micro/macro benchmark > > > > > > >>> - micro 2 read paths > > > > > > >>> - only primitives, no converters (3 x faster > > > with > > > > > > >>> vectorized) > > > > > > >>> - complex with converters (no different > > > > performance) > > > > > > >>> - macro Presto : > > > > > > >>> - complex types not better > > > > > > >>> - 2x better for primitive types > > > > > > >>> - Daniel: projection + predicate well optimized with > > presto > > > > > (lazy > > > > > > >>> load, lazy materialization). predicate push down and using > dic > > in > > > > > > >> predicate > > > > > > >>> evaluation. > > > > > > >>> - Ippokratis: fan out? => 100 values per collection, > > > list/map > > > > > > >>> materialization expansive > > > > > > >>> > > > > > > >>> - Dictionary encoding: because of fallback mechanism. We > don't > > > know > > > > > > when > > > > > > >>> the dictionary ends. => Jason to open a JIRA > > > > > > >>> > > > > > > >>> - Parquet-99: OOM on write > > > > > > >>> - all big rows: (10MB per row) runs OOM before we first > check > > > > > > >>> - big variability in size: small initial rows throw off > > > estimate > > > > > and > > > > > > >>> following big rows blow memory > > > > > > >>> - add settings for checking at constant #rows. > > > > > > >>> - we should experiment with simpler strategies > > > > > > >>> > > > > > > >>> - ByteBuffer status: > > > > > > >>> - Jason need to rebase the PR > > > > > > >>> - Parquet-77 > > > > > > >>> > > > > > > >>> > > > > > > >>> On Tue, Jul 21, 2015 at 10:05 AM, Julien Le Dem < > > > > [email protected]> > > > > > > >>> wrote: > > > > > > >>> > > > > > > >>>> It's happening now: > > > > > > >>>> > > https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up > > > > > > >>>> > > > > > > >>>> On Tue, Jul 14, 2015 at 10:04 AM, Julien Le Dem < > > > > [email protected] > > > > > > > > > > > > >>>> wrote: > > > > > > >>>> > > > > > > >>>>> The next Parquet sync up will be held on google hangout on > > > > > 7/21/2015 > > > > > > >> at > > > > > > >>>>> 10 am PST > > > > > > >>>>> > > https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Alex Levenson > > > > > @THISWILLWORK > > > > > > > > > > > > > > > > > > > > > -- > > > Alex Levenson > > > @THISWILLWORK > > > > > >
