A blog is a great idea. I am curious about how much compression costs.
On Wed, Sep 29, 2021 at 5:37 AM luoc <l...@apache.org> wrote: > > James, you are doing fine. > Is it possible to post a new blog in the website for this? > > > 在 2021年9月29日,20:27,James Turton <dz...@apache.org> 写道: > > > > Hi all > > > > We've got support for reading and writing using additional Parquet > compression codecs in master now. Here are the footprints of a 25M record > dataset compressed by Drill with different codecs. > > > > | Codec | Size on disk (Mb) | > > | ------ | ----------------- | > > | brotli | 87 | > > | gzip | 80 | > > | lz4 | 100.6 | > > | lzo | 100.8 | > > | snappy | 192 | > > | zstd | 85 | > > | none | 2152 | > > > > I haven't made measurements of (de)compression speed differences myself > but there are many such benchmarks around on the web, and the differences > can be big *if* you've got a workload that is CPU bound by > (de)compression. Beyond that there are the usual considerations like > better utilisation of the OS page cache by the higher compression ratio > codecs, less I/O when data must come from disk, etc. Zstd is probably the > one I'll be putting into `store.parquet.compression` myself at this point. > > > > Happy Drilling! > > James > >