I have a slight correct for the Brotli encoding numbers. The 20% size decrease incurred a 2.5% increase in compression time (using brotli-5), while the 15% size decrease had a 12% encoding time *decrease* (using brotli-4). We've decided to use brotli-5 for tables that are read a lot, and brotli-4 for most other tables.
On Thu, Nov 10, 2016 at 11:26 AM, Julien Le Dem <[email protected]> wrote: > Attendees/agenda: > Zoltan (Cloudera): > - Parquet tools questions > Piyush (Twitter): > - planning on encoding optimization > Uwe: > - release parquet-cpp > - license/notice questions > Wes (twosigma): > - working on arrow > - helping with the parquet-cpp release > Deepak (HP/Vertica): > - read/write parquet-cpp > - discuss. statistics PARQUET-686. timestamps/... > Ryan (Netflix): > - 1.9.0 release out. > - statistics > Julien (Dremio): > - Parquet-Arrow integration > > Notes: > Parquet-tools: > - when missing hadoop jars on the class path => bad error message > - 1.6 used to bundle hadoop > - 1.9 requires adding hadoop classpath > - Ryan has new new CLI tool > > Parquet cpp release: > - need to put mentions in NOTICE files > - merge script came from the Spark project (Apache 2 License) > - some code came from Impala (Apache 2 License) > - Need to track the files imported from impala > - Wes to document. > - Zoltan to look into moving copyright to NOTICE > > Statistics: > - Revisit signed/unsigned stats approach > - instead add information on how the min/man got obtained. (Collation) > - collation should follow a standard. We’re going to implement only a > subset. > - JIRA PARQUET-686 > > int96: > - deprecate write of int96 (Ryan to look into it) > > New Encodings/compression: > - brotli compression. => 20% decrease in size. 25% increase in encoding > time. other settings: 15%/12% (compared to gzip). Ryan to update the PR. > - need cpp integration as well. Uwe > - PARQUET-682: specify encoding per column. Piyush to update PR > > > > On Thu, Nov 10, 2016 at 10:00 AM, Julien Le Dem <[email protected]> wrote: > > > starting now > > https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up > > > > On Thu, Nov 10, 2016 at 8:51 AM, Julien Le Dem <[email protected]> > wrote: > > > >> Reminder that the Parquet Sync up will be in 1h at 10am PT on hangout: > >> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up > >> > >> -- > >> Julien > >> > > > > > > > > -- > > Julien > > > > > > -- > Julien > -- Ryan Blue Software Engineer Netflix
