I have a slight correct for the Brotli encoding numbers. The 20% size
decrease incurred a 2.5% increase in compression time (using brotli-5),
while the 15% size decrease had a 12% encoding time *decrease* (using
brotli-4). We've decided to use brotli-5 for tables that are read a lot,
and brotli-4 for most other tables.

On Thu, Nov 10, 2016 at 11:26 AM, Julien Le Dem <[email protected]> wrote:

>  Attendees/agenda:
> Zoltan (Cloudera):
>  - Parquet tools questions
> Piyush (Twitter):
>  - planning on encoding optimization
> Uwe:
>  - release parquet-cpp
>  - license/notice questions
> Wes (twosigma):
>  - working on arrow
>  - helping with the parquet-cpp release
> Deepak (HP/Vertica):
>  - read/write parquet-cpp
>  - discuss. statistics PARQUET-686. timestamps/...
> Ryan (Netflix):
>  - 1.9.0 release out.
>  - statistics
> Julien (Dremio):
>  - Parquet-Arrow integration
>
> Notes:
> Parquet-tools:
>  - when missing hadoop jars on the class path => bad error message
>    - 1.6 used to bundle hadoop
>    - 1.9 requires adding hadoop classpath
>  - Ryan has new new CLI tool
>
> Parquet cpp release:
>  - need to put mentions in NOTICE files
>    - merge script came from the Spark project (Apache 2 License)
>    - some code came from Impala (Apache 2 License)
>  - Need to track the files imported from impala
>    - Wes to document.
>    - Zoltan to look into moving copyright to NOTICE
>
> Statistics:
>  - Revisit signed/unsigned stats approach
>  - instead add information on how the min/man got obtained. (Collation)
>  - collation should follow a standard. We’re going to implement only a
> subset.
>  - JIRA PARQUET-686
>
> int96:
>  - deprecate write of int96 (Ryan to look into it)
>
> New Encodings/compression:
>  - brotli compression. => 20% decrease in size. 25% increase in encoding
> time. other settings: 15%/12% (compared to gzip). Ryan to update the PR.
>     - need cpp integration as well. Uwe
>  - PARQUET-682: specify encoding per column. Piyush to update PR
>
>
>
> On Thu, Nov 10, 2016 at 10:00 AM, Julien Le Dem <[email protected]> wrote:
>
> > starting now
> > https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
> >
> > On Thu, Nov 10, 2016 at 8:51 AM, Julien Le Dem <[email protected]>
> wrote:
> >
> >> Reminder that the Parquet Sync up will be in 1h at 10am PT on hangout:
> >> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
> >>
> >> --
> >> Julien
> >>
> >
> >
> >
> > --
> > Julien
> >
>
>
>
> --
> Julien
>



-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to