Attendees/agenda:
Zoltan (Cloudera):
- Parquet tools questions
Piyush (Twitter):
- planning on encoding optimization
Uwe:
- release parquet-cpp
- license/notice questions
Wes (twosigma):
- working on arrow
- helping with the parquet-cpp release
Deepak (HP/Vertica):
- read/write parquet-cpp
- discuss. statistics PARQUET-686. timestamps/...
Ryan (Netflix):
- 1.9.0 release out.
- statistics
Julien (Dremio):
- Parquet-Arrow integration
Notes:
Parquet-tools:
- when missing hadoop jars on the class path => bad error message
- 1.6 used to bundle hadoop
- 1.9 requires adding hadoop classpath
- Ryan has new new CLI tool
Parquet cpp release:
- need to put mentions in NOTICE files
- merge script came from the Spark project (Apache 2 License)
- some code came from Impala (Apache 2 License)
- Need to track the files imported from impala
- Wes to document.
- Zoltan to look into moving copyright to NOTICE
Statistics:
- Revisit signed/unsigned stats approach
- instead add information on how the min/man got obtained. (Collation)
- collation should follow a standard. We’re going to implement only a
subset.
- JIRA PARQUET-686
int96:
- deprecate write of int96 (Ryan to look into it)
New Encodings/compression:
- brotli compression. => 20% decrease in size. 25% increase in encoding
time. other settings: 15%/12% (compared to gzip). Ryan to update the PR.
- need cpp integration as well. Uwe
- PARQUET-682: specify encoding per column. Piyush to update PR
On Thu, Nov 10, 2016 at 10:00 AM, Julien Le Dem <[email protected]> wrote:
> starting now
> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>
> On Thu, Nov 10, 2016 at 8:51 AM, Julien Le Dem <[email protected]> wrote:
>
>> Reminder that the Parquet Sync up will be in 1h at 10am PT on hangout:
>> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>
--
Julien