Hi all
We've got support for reading and writing using additional Parquet
compression codecs in master now. Here are the footprints of a 25M
record dataset compressed by Drill with different codecs.
| Codec | Size on disk (Mb) |
| ------ | ----------------- |
| brotli | 87 |
| gzip | 80 |
| lz4 | 100.6 |
| lzo | 100.8 |
| snappy | 192 |
| zstd | 85 |
| none | 2152 |
I haven't made measurements of (de)compression speed differences myself
but there are many such benchmarks around on the web, and the
differences can be big *if* you've got a workload that is CPU bound by
(de)compression. Beyond that there are the usual considerations like
better utilisation of the OS page cache by the higher compression ratio
codecs, less I/O when data must come from disk, etc. Zstd is probably
the one I'll be putting into `store.parquet.compression` myself at this
point.
Happy Drilling!
James