Attendees/agenda:
Zoltan (Cloudera):
 - Parquet tools questions
Piyush (Twitter):
 - planning on encoding optimization
Uwe:
 - release parquet-cpp
 - license/notice questions
Wes (twosigma):
 - working on arrow
 - helping with the parquet-cpp release
Deepak (HP/Vertica):
 - read/write parquet-cpp
 - discuss. statistics PARQUET-686. timestamps/...
Ryan (Netflix):
 - 1.9.0 release out.
 - statistics
Julien (Dremio):
 - Parquet-Arrow integration

Notes:
Parquet-tools:
 - when missing hadoop jars on the class path => bad error message
   - 1.6 used to bundle hadoop
   - 1.9 requires adding hadoop classpath
 - Ryan has new new CLI tool

Parquet cpp release:
 - need to put mentions in NOTICE files
   - merge script came from the Spark project (Apache 2 License)
   - some code came from Impala (Apache 2 License)
 - Need to track the files imported from impala
   - Wes to document.
   - Zoltan to look into moving copyright to NOTICE

Statistics:
 - Revisit signed/unsigned stats approach
 - instead add information on how the min/man got obtained. (Collation)
 - collation should follow a standard. We’re going to implement only a
subset.
 - JIRA PARQUET-686

int96:
 - deprecate write of int96 (Ryan to look into it)

New Encodings/compression:
 - brotli compression. => 20% decrease in size. 25% increase in encoding
time. other settings: 15%/12% (compared to gzip). Ryan to update the PR.
    - need cpp integration as well. Uwe
 - PARQUET-682: specify encoding per column. Piyush to update PR



On Thu, Nov 10, 2016 at 10:00 AM, Julien Le Dem <[email protected]> wrote:

> starting now
> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>
> On Thu, Nov 10, 2016 at 8:51 AM, Julien Le Dem <[email protected]> wrote:
>
>> Reminder that the Parquet Sync up will be in 1h at 10am PT on hangout:
>> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>



-- 
Julien

Reply via email to