Re: Next Parquet Sync Up

Julien Le Dem Wed, 22 Jul 2015 15:45:07 -0700

Wednesday then?
no more conflicts?

On Tue, Jul 21, 2015 at 7:26 PM, Alex Levenson <
[email protected]> wrote:


> Sorry to be difficult but, can I request any day other than Monday -- how
> about Wednesday?
>
> On Tue, Jul 21, 2015 at 7:19 PM, Julien Le Dem <[email protected]> wrote:
>
> > There's no particular reason for Tuesdays.
> > We could do the next one on a Monday.
> > Anybody objects?
> >
> > Julien
> >
> > > On Jul 21, 2015, at 17:37, Jacques Nadeau <[email protected]> wrote:
> > >
> > > Any chance we can have these on either a different day or time?  The
> > Drill
> > > hangout is every Tuesday at 10am so I always have to pick one or the
> > other.
> > >
> > > On Tue, Jul 21, 2015 at 10:56 AM, Nezih Yigitbasi <
> > > [email protected]> wrote:
> > >
> > >> An update to "actions", I will create a PR for the vectorized read
> > instead
> > >> of Zhenxiao.
> > >>
> > >> Thanks,
> > >> Nezih
> > >>
> > >> On Tue, Jul 21, 2015 at 10:51 AM, Julien Le Dem
> > <[email protected]
> > >> wrote:
> > >>
> > >>> Agenda
> > >>> - Julien (Twitter):
> > >>>   - interested in ByteBuffer status
> > >>> - Ryan (by email): interested in ByteBuffer status. did some work on
> > >> bloom
> > >>> filters.
> > >>> PARQUET-251 and PARQUET-246 make sure 2.0 encodings and other new
> > >> features
> > >>> are solid.
> > >>> - Daniel, Nezih, Zhengxiao (Netflix):
> > >>>    - update on Vectorized read path for Presto (Dong Chen for Hive)
> > >>>    - Parquet-99: OOM on write
> > >>> - Ippokratis: Impala team.
> > >>> - Jason Altekruse: (Drill/MapR)
> > >>>   - update on Java direct memory representation (hadoop 2.0
> ByteBuffer)
> > >>>   - currently uses a fork of Parquet that uses the GSOC work.
> > >>> - Tianshuo: 1.8.1 release.
> > >>> - Sanjeev (Twitter):
> > >>>  - want to hear updates about vectorized in Presto
> > >>>
> > >>> actions:
> > >>>  - Zhengxiao: update vectorization PR
> > >>>  - Jason: update ByteBuffer PR
> > >>>  - Jason: open JIRA for dic encoding fallback pointer
> > >>>  - Daniel: opened a PR for PARQUET-99: up for review
> > >>>
> > >>> Notes:
> > >>> - Vectorized read path for Presto (Dong Chen for Hive) PARQUET-131
> > >>>       - batch read
> > >>>       - lazy materialization
> > >>>       - Netflix integrated with Presto, Dong Chen integrated with
> Hive
> > >>>       - Nezih: micro/macro benchmark
> > >>>            - micro 2 read paths
> > >>>                  - only primitives, no converters (3 x faster with
> > >>> vectorized)
> > >>>                  - complex with converters (no different performance)
> > >>>            - macro Presto :
> > >>>                  - complex types not better
> > >>>                  - 2x better for primitive types
> > >>>       - Daniel: projection + predicate well optimized with presto
> (lazy
> > >>> load, lazy materialization). predicate push down and using dic in
> > >> predicate
> > >>> evaluation.
> > >>>       - Ippokratis: fan out? => 100 values per collection, list/map
> > >>> materialization expansive
> > >>>
> > >>> - Dictionary encoding: because of fallback mechanism. We don't know
> > when
> > >>> the dictionary ends. => Jason to open a JIRA
> > >>>
> > >>> - Parquet-99: OOM on write
> > >>>   - all big rows: (10MB per row) runs OOM before we first check
> > >>>   - big variability in size: small initial rows throw off estimate
> and
> > >>> following big rows blow memory
> > >>>   - add settings for checking at constant #rows.
> > >>>   - we should experiment with simpler strategies
> > >>>
> > >>> - ByteBuffer status:
> > >>>   - Jason need to rebase the PR
> > >>>   - Parquet-77
> > >>>
> > >>>
> > >>> On Tue, Jul 21, 2015 at 10:05 AM, Julien Le Dem <[email protected]>
> > >>> wrote:
> > >>>
> > >>>> It's happening now:
> > >>>> https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up
> > >>>>
> > >>>> On Tue, Jul 14, 2015 at 10:04 AM, Julien Le Dem <[email protected]
> >
> > >>>> wrote:
> > >>>>
> > >>>>> The next Parquet sync up will be held on google hangout on
> 7/21/2015
> > >> at
> > >>>>> 10 am PST
> > >>>>> https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up
> > >>
> >
>
>
>
> --
> Alex Levenson
> @THISWILLWORK
>

Re: Next Parquet Sync Up

Reply via email to