Re: Compression prototype

Dmitriy Setrakyan Tue, 04 Sep 2018 17:22:35 -0700

In my view, dictionary of 1024 bytes is not going to be nearly enough.

On Tue, Sep 4, 2018 at 8:06 AM, Ilya Kasnacheev <ilya.kasnach...@gmail.com>
wrote:


> Hello!
>
> In case of Apache Ignite, most of savings is due to BinaryObject format,
> which encodes types and fields with byte sequences. Any enum/string flags
> will also get in dictionary. And then as it processes a record it fills up
> its individual dictionary.
>
> But, in one cache, most if not all entries have identical BinaryObject
> layout so a tiny dictionary covers that case. Compression algorithms are
> not very keen on large dictionaries, preferring to work with local
> regularities in byte stream.
>
> E.g. if we have large entries in cache with low BinaryObject overhead,
> they're served just fine by "generic" compression.
>
> All of the above is my speculations, actually. I just observe that on a
> large data set, compression ratio is around 0.4 (2.5x) with a dictionary of
> 1024 bytes. The rest is black box.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> вт, 4 сент. 2018 г. в 17:16, Dmitriy Setrakyan <dsetrak...@apache.org>:
>
> > On Tue, Sep 4, 2018 at 2:55 AM, Ilya Kasnacheev <
> ilya.kasnach...@gmail.com
> > >
> > wrote:
> >
> > > Hello!
> > >
> > > Each node has a local dictionary (per node currently, per cache
> planned).
> > > Dictionary is never shared between nodes. As data patterns shift,
> > > dictionary rotation is also planned.
> > >
> > > With Zstd, the best dictionary size seems to be 1024 bytes. I imagine
> It
> > is
> > > enough to store common BinaryObject boilerplate, and everything else is
> > > compressed on the fly. The source sample is 16k records.
> > >
> > >
> > Thanks, Ilya, understood. I think per-cache is a better idea. However, I
> > have a question about dictionary size. Ignite stores TBs of data. How do
> > you plan the dictionary to fit in 1K bytes?
> >
> > D.
> >
>

Re: Compression prototype

Reply via email to