Hi all

We've got support for reading and writing using additional Parquet compression codecs in master now.  Here are the footprints of a 25M record dataset compressed by Drill with different codecs.

| Codec  | Size on disk (Mb) |
| ------ | ----------------- |
| brotli |   87              |
| gzip   |   80              |
| lz4    |  100.6            |
| lzo    |  100.8            |
| snappy |  192              |
| zstd   |   85              |
| none   | 2152              |

I haven't made measurements of (de)compression speed differences myself but there are many such benchmarks around on the web, and the differences can be big *if* you've got a workload that is CPU bound by (de)compression.  Beyond that there are the usual considerations like better utilisation of the OS page cache by the higher compression ratio codecs, less I/O when data must come from disk, etc.  Zstd is probably the one I'll be putting into `store.parquet.compression` myself at this point.

Happy Drilling!
James

Reply via email to