Re: Parquet sync starting now

Lars Volker Wed, 16 Aug 2017 13:54:11 -0700

Here are the notes I took:

Pooja (CMU, Cloudera): Present her work on Parquet indices
Yaliang (Presto), Zoltan (Cloudera), Anna (Cloudera), Marcel, Deepak
(Vertica): Interested in Parquet index work
Ryan (Netflix): Parquet indices, compression
Junjie (Intel): Bloom filter proposal


Parquet Indices:

   - Pooja presented her work.
   - We discussed that valid_values should be kept, distinct_values should
   be removed from the proposal.
   - It'd be interesting to see figures for larger page sizes, parquet-mr
   uses 1MB
   - There was agreement that page indexes should eventually replace page
   statistics
   - We discussed the following next steps
      - Prepare a PR for parquet-format, continue the discussion there
      - Link the slides to the JIRA and mail them to dev@
      - Update the title of PARQUET-922 to better reflect the ongoing work
      (Lars did this already)
      - Add the performance evaluation to the design doc

Compression:

   - Ryan built a JAR that supports zstd, lz4, brotli and is happy to share
   it with anyone who'd like to run their own experiments

Bloom Filters:

   - Junjie prepared a sheet comparing the performance of bloom filters
   with dictionary compression. Folks hadn't had time to look at the results
   so we'll continue the discussion on dev@ and in the next Parquet sync.
   - We may also want to compare them to index based page skipping




On Wed, Aug 16, 2017 at 9:01 AM, Lars Volker <[email protected]> wrote:

> Join us here: https://meet.google.com/zyd-mwbm-zpe
>

Re: Parquet sync starting now

Reply via email to