Re: Next Parquet Sync Up

Julien Le Dem Tue, 21 Jul 2015 19:22:24 -0700

There's no particular reason for Tuesdays.
We could do the next one on a Monday.
Anybody objects?


Julien

> On Jul 21, 2015, at 17:37, Jacques Nadeau <[email protected]> wrote:
> 
> Any chance we can have these on either a different day or time?  The Drill
> hangout is every Tuesday at 10am so I always have to pick one or the other.
> 
> On Tue, Jul 21, 2015 at 10:56 AM, Nezih Yigitbasi <
> [email protected]> wrote:
> 
>> An update to "actions", I will create a PR for the vectorized read instead
>> of Zhenxiao.
>> 
>> Thanks,
>> Nezih
>> 
>> On Tue, Jul 21, 2015 at 10:51 AM, Julien Le Dem <[email protected]
>> wrote:
>> 
>>> Agenda
>>> - Julien (Twitter):
>>>   - interested in ByteBuffer status
>>> - Ryan (by email): interested in ByteBuffer status. did some work on
>> bloom
>>> filters.
>>> PARQUET-251 and PARQUET-246 make sure 2.0 encodings and other new
>> features
>>> are solid.
>>> - Daniel, Nezih, Zhengxiao (Netflix):
>>>    - update on Vectorized read path for Presto (Dong Chen for Hive)
>>>    - Parquet-99: OOM on write
>>> - Ippokratis: Impala team.
>>> - Jason Altekruse: (Drill/MapR)
>>>   - update on Java direct memory representation (hadoop 2.0 ByteBuffer)
>>>   - currently uses a fork of Parquet that uses the GSOC work.
>>> - Tianshuo: 1.8.1 release.
>>> - Sanjeev (Twitter):
>>>  - want to hear updates about vectorized in Presto
>>> 
>>> actions:
>>>  - Zhengxiao: update vectorization PR
>>>  - Jason: update ByteBuffer PR
>>>  - Jason: open JIRA for dic encoding fallback pointer
>>>  - Daniel: opened a PR for PARQUET-99: up for review
>>> 
>>> Notes:
>>> - Vectorized read path for Presto (Dong Chen for Hive) PARQUET-131
>>>       - batch read
>>>       - lazy materialization
>>>       - Netflix integrated with Presto, Dong Chen integrated with Hive
>>>       - Nezih: micro/macro benchmark
>>>            - micro 2 read paths
>>>                  - only primitives, no converters (3 x faster with
>>> vectorized)
>>>                  - complex with converters (no different performance)
>>>            - macro Presto :
>>>                  - complex types not better
>>>                  - 2x better for primitive types
>>>       - Daniel: projection + predicate well optimized with presto (lazy
>>> load, lazy materialization). predicate push down and using dic in
>> predicate
>>> evaluation.
>>>       - Ippokratis: fan out? => 100 values per collection, list/map
>>> materialization expansive
>>> 
>>> - Dictionary encoding: because of fallback mechanism. We don't know when
>>> the dictionary ends. => Jason to open a JIRA
>>> 
>>> - Parquet-99: OOM on write
>>>   - all big rows: (10MB per row) runs OOM before we first check
>>>   - big variability in size: small initial rows throw off estimate and
>>> following big rows blow memory
>>>   - add settings for checking at constant #rows.
>>>   - we should experiment with simpler strategies
>>> 
>>> - ByteBuffer status:
>>>   - Jason need to rebase the PR
>>>   - Parquet-77
>>> 
>>> 
>>> On Tue, Jul 21, 2015 at 10:05 AM, Julien Le Dem <[email protected]>
>>> wrote:
>>> 
>>>> It's happening now:
>>>> https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up
>>>> 
>>>> On Tue, Jul 14, 2015 at 10:04 AM, Julien Le Dem <[email protected]>
>>>> wrote:
>>>> 
>>>>> The next Parquet sync up will be held on google hangout on 7/21/2015
>> at
>>>>> 10 am PST
>>>>> https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up
>>

Re: Next Parquet Sync Up

Reply via email to