A blog is a great idea.

I am curious about how much compression costs.


On Wed, Sep 29, 2021 at 5:37 AM luoc <l...@apache.org> wrote:

>
> James, you are doing fine.
> Is it possible to post a new blog in the website for this?
>
> > 在 2021年9月29日,20:27,James Turton <dz...@apache.org> 写道:
> >
> > Hi all
> >
> > We've got support for reading and writing using additional Parquet
> compression codecs in master now.  Here are the footprints of a 25M record
> dataset compressed by Drill with different codecs.
> >
> > | Codec  | Size on disk (Mb) |
> > | ------ | ----------------- |
> > | brotli |   87              |
> > | gzip   |   80              |
> > | lz4    |  100.6            |
> > | lzo    |  100.8            |
> > | snappy |  192              |
> > | zstd   |   85              |
> > | none   | 2152              |
> >
> > I haven't made measurements of (de)compression speed differences myself
> but there are many such benchmarks around on the web, and the differences
> can be big *if* you've got a workload that is CPU bound by
> (de)compression.  Beyond that there are the usual considerations like
> better utilisation of the OS page cache by the higher compression ratio
> codecs, less I/O when data must come from disk, etc.  Zstd is probably the
> one I'll be putting into `store.parquet.compression` myself at this point.
> >
> > Happy Drilling!
> > James
>
>

Reply via email to