Re: Next Parquet Sync Up

Ryan Blue Sat, 25 Jul 2015 13:26:53 -0700

+1 Wednesday

On 07/22/2015 04:58 PM, Julien Le Dem wrote:

+1 Wednesday


On Wed, Jul 22, 2015 at 4:02 PM, Jason Altekruse <[email protected]>
wrote:

+1 for wednesday

On Wed, Jul 22, 2015 at 3:47 PM, Jacques Nadeau <[email protected]>
wrote:

+1 for Wed.

On Wed, Jul 22, 2015 at 3:45 PM, Alex Levenson <
[email protected]> wrote:

+1 for Wednesday

On Wed, Jul 22, 2015 at 3:44 PM, Julien Le Dem

<[email protected]

wrote:

Wednesday then?
no more conflicts?

On Tue, Jul 21, 2015 at 7:26 PM, Alex Levenson <
[email protected]> wrote:

Sorry to be difficult but, can I request any day other than Monday

--

how

about Wednesday?

On Tue, Jul 21, 2015 at 7:19 PM, Julien Le Dem <[email protected]>

wrote:

There's no particular reason for Tuesdays.
We could do the next one on a Monday.
Anybody objects?

Julien

On Jul 21, 2015, at 17:37, Jacques Nadeau <[email protected]>

wrote:


Any chance we can have these on either a different day or time?

The

Drill

hangout is every Tuesday at 10am so I always have to pick one

or

the

other.


On Tue, Jul 21, 2015 at 10:56 AM, Nezih Yigitbasi <
[email protected]> wrote:

An update to "actions", I will create a PR for the vectorized

read

instead

of Zhenxiao.

Thanks,
Nezih

On Tue, Jul 21, 2015 at 10:51 AM, Julien Le Dem

<[email protected]

wrote:

Agenda
- Julien (Twitter):
   - interested in ByteBuffer status
- Ryan (by email): interested in ByteBuffer status. did some

work

on

bloom

filters.
PARQUET-251 and PARQUET-246 make sure 2.0 encodings and other

new

features

are solid.
- Daniel, Nezih, Zhengxiao (Netflix):
    - update on Vectorized read path for Presto (Dong Chen for

Hive)

    - Parquet-99: OOM on write
- Ippokratis: Impala team.
- Jason Altekruse: (Drill/MapR)
   - update on Java direct memory representation (hadoop 2.0

ByteBuffer)

   - currently uses a fork of Parquet that uses the GSOC work.
- Tianshuo: 1.8.1 release.
- Sanjeev (Twitter):
  - want to hear updates about vectorized in Presto

actions:
  - Zhengxiao: update vectorization PR
  - Jason: update ByteBuffer PR
  - Jason: open JIRA for dic encoding fallback pointer
  - Daniel: opened a PR for PARQUET-99: up for review

Notes:
- Vectorized read path for Presto (Dong Chen for Hive)

PARQUET-131

       - batch read
       - lazy materialization
       - Netflix integrated with Presto, Dong Chen integrated

with

Hive

       - Nezih: micro/macro benchmark
            - micro 2 read paths
                  - only primitives, no converters (3 x faster

with

vectorized)
                  - complex with converters (no different

performance)

            - macro Presto :
                  - complex types not better
                  - 2x better for primitive types
       - Daniel: projection + predicate well optimized with

presto

(lazy

load, lazy materialization). predicate push down and using

dic

in

predicate

evaluation.
       - Ippokratis: fan out? => 100 values per collection,

list/map

materialization expansive

- Dictionary encoding: because of fallback mechanism. We

don't

know

when

the dictionary ends. => Jason to open a JIRA

- Parquet-99: OOM on write
   - all big rows: (10MB per row) runs OOM before we first

check

   - big variability in size: small initial rows throw off

estimate

and

following big rows blow memory
   - add settings for checking at constant #rows.
   - we should experiment with simpler strategies

- ByteBuffer status:
   - Jason need to rebase the PR
   - Parquet-77


On Tue, Jul 21, 2015 at 10:05 AM, Julien Le Dem <

[email protected]>

wrote:

It's happening now:

https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up


On Tue, Jul 14, 2015 at 10:04 AM, Julien Le Dem <

[email protected]

wrote:

The next Parquet sync up will be held on google hangout on

7/21/2015

at

10 am PST

https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up




--
Alex Levenson
@THISWILLWORK




--
Alex Levenson
@THISWILLWORK



--
Ryan Blue
Software Engineer
Cloudera, Inc.

Re: Next Parquet Sync Up

Reply via email to