Attendees/Agenda
Julien (Dremio):
- 1.9.0 release
Dan, Ryan (Netflix):
- new statistics discussion (ordering)
- new encodings.
- IOManager discussion
- time wasted in GC in Hive Parquet serde
Piyush (Twitter):
- better thrift integration in Scala
Sergio (Cloudera):
- presented new people working on Parquet on Cloudera side
- follow thread about creating APIs in Hive to make it easier for ORC and
Parquet to be compatible across versions
Uwe (Blue Yonder):
- look into a bug in parquet-cpp
- prepare for first release of parquet-cpp 0.1
Notes:
- Versioning:
- we should decouple library versioning from format versioning:
- allow major release more often and remove deprecated apis
- parquet-mr/cpp/format versioned independently
- parquet-cpp to start at 0.1
- 1.9.0 release
- blocked on Statistics: need Alex’s feedback
- want to release ASAP
- releasing:
- need to release more often
- at least make a minor release every 3 months.
- make a patch releases as necessary (any bug fix might warrant a patch
release)
- rotate release manager role. (Ryan, Piyush, ...)
- validation integration/performance tests from Netflix/Twitter
- delete hive serve in Parquet since it’s been in hive for a while
- new encodings:
- Ryan tried new encodings.
- RLE + bitwidth + zigzag + delta: good results
- should make a flag per new encodings:
- for compatibility with other implementations
- for performance
- should document what encoding is supported for each version of
parquet-mr/parquet-cpp/impala
- Action: Ryan. Start a document.
- strategies to select an encoding:
- Piyush has started experimenting and would like feedback.
- better fallback solution.
- Ryan: tools to re-encode and compare performance of encodings.
- Action: Ryan email dev list about where to put it.
- IOManager: perf on S3, allocations with G1 collector.
- optimization of seeks vs reads, when to ignore
- reduce firs record latency
- use threads
- G1 collector humongous allocations pinned to old gen memory.
- greater than a certain size. Default row group size hits the
limit.
- Action: open JIRA.
- time wasted in GC in hive parquet serde:
- Action: Create JIRA
On Thu, Oct 6, 2016 at 10:00 AM, Julien Le Dem <[email protected]> wrote:
> Parquet sync starting now at:
> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>
> On Wed, Oct 5, 2016 at 9:52 PM, Julien Le Dem <[email protected]> wrote:
>
>> Yes that's correct
>> The next parquet sync is tomorrow 10am PT on Google hangout
>>
>>
>> On Monday, September 26, 2016, Jim Pivarski <[email protected]> wrote:
>>
>>> On Thu, Sep 22, 2016 at 7:18 PM, Julien Le Dem <[email protected]>
>>> wrote:
>>>
>>> > The sync next week collides with strata Conf in NY.
>>> > I propose to move it to the following week.
>>>
>>>
>>> Does that mean it would be pushed back to Thursday, October 3 at 10-11am
>>> PT?
>>>
>>
>>
>> --
>> Julien
>>
>>
>
>
> --
> Julien
>
--
Julien