+1 Wednesday
On 07/22/2015 04:58 PM, Julien Le Dem wrote:
+1 Wednesday
On Wed, Jul 22, 2015 at 4:02 PM, Jason Altekruse <[email protected]>
wrote:
+1 for wednesday
On Wed, Jul 22, 2015 at 3:47 PM, Jacques Nadeau <[email protected]>
wrote:
+1 for Wed.
On Wed, Jul 22, 2015 at 3:45 PM, Alex Levenson <
[email protected]> wrote:
+1 for Wednesday
On Wed, Jul 22, 2015 at 3:44 PM, Julien Le Dem
<[email protected]
wrote:
Wednesday then?
no more conflicts?
On Tue, Jul 21, 2015 at 7:26 PM, Alex Levenson <
[email protected]> wrote:
Sorry to be difficult but, can I request any day other than Monday
--
how
about Wednesday?
On Tue, Jul 21, 2015 at 7:19 PM, Julien Le Dem <[email protected]>
wrote:
There's no particular reason for Tuesdays.
We could do the next one on a Monday.
Anybody objects?
Julien
On Jul 21, 2015, at 17:37, Jacques Nadeau <[email protected]>
wrote:
Any chance we can have these on either a different day or time?
The
Drill
hangout is every Tuesday at 10am so I always have to pick one
or
the
other.
On Tue, Jul 21, 2015 at 10:56 AM, Nezih Yigitbasi <
[email protected]> wrote:
An update to "actions", I will create a PR for the vectorized
read
instead
of Zhenxiao.
Thanks,
Nezih
On Tue, Jul 21, 2015 at 10:51 AM, Julien Le Dem
<[email protected]
wrote:
Agenda
- Julien (Twitter):
- interested in ByteBuffer status
- Ryan (by email): interested in ByteBuffer status. did some
work
on
bloom
filters.
PARQUET-251 and PARQUET-246 make sure 2.0 encodings and other
new
features
are solid.
- Daniel, Nezih, Zhengxiao (Netflix):
- update on Vectorized read path for Presto (Dong Chen for
Hive)
- Parquet-99: OOM on write
- Ippokratis: Impala team.
- Jason Altekruse: (Drill/MapR)
- update on Java direct memory representation (hadoop 2.0
ByteBuffer)
- currently uses a fork of Parquet that uses the GSOC work.
- Tianshuo: 1.8.1 release.
- Sanjeev (Twitter):
- want to hear updates about vectorized in Presto
actions:
- Zhengxiao: update vectorization PR
- Jason: update ByteBuffer PR
- Jason: open JIRA for dic encoding fallback pointer
- Daniel: opened a PR for PARQUET-99: up for review
Notes:
- Vectorized read path for Presto (Dong Chen for Hive)
PARQUET-131
- batch read
- lazy materialization
- Netflix integrated with Presto, Dong Chen integrated
with
Hive
- Nezih: micro/macro benchmark
- micro 2 read paths
- only primitives, no converters (3 x faster
with
vectorized)
- complex with converters (no different
performance)
- macro Presto :
- complex types not better
- 2x better for primitive types
- Daniel: projection + predicate well optimized with
presto
(lazy
load, lazy materialization). predicate push down and using
dic
in
predicate
evaluation.
- Ippokratis: fan out? => 100 values per collection,
list/map
materialization expansive
- Dictionary encoding: because of fallback mechanism. We
don't
know
when
the dictionary ends. => Jason to open a JIRA
- Parquet-99: OOM on write
- all big rows: (10MB per row) runs OOM before we first
check
- big variability in size: small initial rows throw off
estimate
and
following big rows blow memory
- add settings for checking at constant #rows.
- we should experiment with simpler strategies
- ByteBuffer status:
- Jason need to rebase the PR
- Parquet-77
On Tue, Jul 21, 2015 at 10:05 AM, Julien Le Dem <
[email protected]>
wrote:
It's happening now:
https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up
On Tue, Jul 14, 2015 at 10:04 AM, Julien Le Dem <
[email protected]
wrote:
The next Parquet sync up will be held on google hangout on
7/21/2015
at
10 am PST
https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up
--
Alex Levenson
@THISWILLWORK
--
Alex Levenson
@THISWILLWORK
--
Ryan Blue
Software Engineer
Cloudera, Inc.