Yeah, next steps are to look at decompression speeds and do a more thorough comparison between brotli compression levels and zstd levels. This initial set of data is just to make sure that data produced by Parquet works well with the compression codec because a significant number of the columns are dictionary-encoded before applying the generic codec. Tables three and four are the cases that exercise this the most, and they do really well with zstd and brotli.
On Thu, Sep 28, 2017 at 3:51 PM, Tim Armstrong <[email protected]> wrote: > Thanks for all the work you've done on benchmarking here, seems like it > could be a big improvement. I can't seem to find decompression numbers in > your spreadsheet. I think those should be where some of these newer codecs > really shine. E.g. zstd's own numbers look really impressive: > http://facebook.github.io/zstd/ > -- Ryan Blue Software Engineer Netflix
