Here are the notes I took:
Pooja (CMU, Cloudera): Present her work on Parquet indices
Yaliang (Presto), Zoltan (Cloudera), Anna (Cloudera), Marcel, Deepak
(Vertica): Interested in Parquet index work
Ryan (Netflix): Parquet indices, compression
Junjie (Intel): Bloom filter proposal
Parquet Indices:
- Pooja presented her work.
- We discussed that valid_values should be kept, distinct_values should
be removed from the proposal.
- It'd be interesting to see figures for larger page sizes, parquet-mr
uses 1MB
- There was agreement that page indexes should eventually replace page
statistics
- We discussed the following next steps
- Prepare a PR for parquet-format, continue the discussion there
- Link the slides to the JIRA and mail them to dev@
- Update the title of PARQUET-922 to better reflect the ongoing work
(Lars did this already)
- Add the performance evaluation to the design doc
Compression:
- Ryan built a JAR that supports zstd, lz4, brotli and is happy to share
it with anyone who'd like to run their own experiments
Bloom Filters:
- Junjie prepared a sheet comparing the performance of bloom filters
with dictionary compression. Folks hadn't had time to look at the results
so we'll continue the discussion on dev@ and in the next Parquet sync.
- We may also want to compare them to index based page skipping
On Wed, Aug 16, 2017 at 9:01 AM, Lars Volker <[email protected]> wrote:
> Join us here: https://meet.google.com/zyd-mwbm-zpe
>