+1 Wednesday

On 07/22/2015 04:58 PM, Julien Le Dem wrote:
+1 Wednesday

On Wed, Jul 22, 2015 at 4:02 PM, Jason Altekruse <[email protected]>
wrote:

+1 for wednesday

On Wed, Jul 22, 2015 at 3:47 PM, Jacques Nadeau <[email protected]>
wrote:

+1 for Wed.

On Wed, Jul 22, 2015 at 3:45 PM, Alex Levenson <
[email protected]> wrote:

+1 for Wednesday

On Wed, Jul 22, 2015 at 3:44 PM, Julien Le Dem
<[email protected]

wrote:

Wednesday then?
no more conflicts?

On Tue, Jul 21, 2015 at 7:26 PM, Alex Levenson <
[email protected]> wrote:

Sorry to be difficult but, can I request any day other than Monday
--
how
about Wednesday?

On Tue, Jul 21, 2015 at 7:19 PM, Julien Le Dem <[email protected]>
wrote:

There's no particular reason for Tuesdays.
We could do the next one on a Monday.
Anybody objects?

Julien

On Jul 21, 2015, at 17:37, Jacques Nadeau <[email protected]>
wrote:

Any chance we can have these on either a different day or time?
The
Drill
hangout is every Tuesday at 10am so I always have to pick one
or
the
other.

On Tue, Jul 21, 2015 at 10:56 AM, Nezih Yigitbasi <
[email protected]> wrote:

An update to "actions", I will create a PR for the vectorized
read
instead
of Zhenxiao.

Thanks,
Nezih

On Tue, Jul 21, 2015 at 10:51 AM, Julien Le Dem
<[email protected]
wrote:

Agenda
- Julien (Twitter):
   - interested in ByteBuffer status
- Ryan (by email): interested in ByteBuffer status. did some
work
on
bloom
filters.
PARQUET-251 and PARQUET-246 make sure 2.0 encodings and other
new
features
are solid.
- Daniel, Nezih, Zhengxiao (Netflix):
    - update on Vectorized read path for Presto (Dong Chen for
Hive)
    - Parquet-99: OOM on write
- Ippokratis: Impala team.
- Jason Altekruse: (Drill/MapR)
   - update on Java direct memory representation (hadoop 2.0
ByteBuffer)
   - currently uses a fork of Parquet that uses the GSOC work.
- Tianshuo: 1.8.1 release.
- Sanjeev (Twitter):
  - want to hear updates about vectorized in Presto

actions:
  - Zhengxiao: update vectorization PR
  - Jason: update ByteBuffer PR
  - Jason: open JIRA for dic encoding fallback pointer
  - Daniel: opened a PR for PARQUET-99: up for review

Notes:
- Vectorized read path for Presto (Dong Chen for Hive)
PARQUET-131
       - batch read
       - lazy materialization
       - Netflix integrated with Presto, Dong Chen integrated
with
Hive
       - Nezih: micro/macro benchmark
            - micro 2 read paths
                  - only primitives, no converters (3 x faster
with
vectorized)
                  - complex with converters (no different
performance)
            - macro Presto :
                  - complex types not better
                  - 2x better for primitive types
       - Daniel: projection + predicate well optimized with
presto
(lazy
load, lazy materialization). predicate push down and using
dic
in
predicate
evaluation.
       - Ippokratis: fan out? => 100 values per collection,
list/map
materialization expansive

- Dictionary encoding: because of fallback mechanism. We
don't
know
when
the dictionary ends. => Jason to open a JIRA

- Parquet-99: OOM on write
   - all big rows: (10MB per row) runs OOM before we first
check
   - big variability in size: small initial rows throw off
estimate
and
following big rows blow memory
   - add settings for checking at constant #rows.
   - we should experiment with simpler strategies

- ByteBuffer status:
   - Jason need to rebase the PR
   - Parquet-77


On Tue, Jul 21, 2015 at 10:05 AM, Julien Le Dem <
[email protected]>
wrote:

It's happening now:

https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up

On Tue, Jul 14, 2015 at 10:04 AM, Julien Le Dem <
[email protected]

wrote:

The next Parquet sync up will be held on google hangout on
7/21/2015
at
10 am PST

https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up





--
Alex Levenson
@THISWILLWORK





--
Alex Levenson
@THISWILLWORK






--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to