Igniters,

This is to inform you that we had another conversation about compression
design, where Ivan's proposal about per-partition compression was
discussed. Let me share our current vision.

1) Per-partition approach will have higher compression rate, than per-page
approach. This is beyond doubts.
2) But it is much more difficult to implement comparing to per-page. In
particular, dictionary management. Moreover, write operations might suffer.

As a result we think that per-page approach is the way to go as a first
iteration. In future we may consider per-partition compression as a kind of
"one-time" batch operation which could be performed on rarely updated and
historical data. Tiered approach. Mature commercial vendors work in exactly
the same way. Sergey Puchnin and I are trying to better understand on how
exactly compression could be implemented.

What we understand now:
1) This will be dictionary-based compression (e.g. LZV)
2) Page will be compressed in batch mode. I.e. not on every change, but
when certain threshold is reached (e.g. page's free space drops below 20%)

What we do not understand yet:
1) Granularity of compression algorithm.
1.1) It could be per-entry - i.e. we compress the whole entry content, but
respect boundaries between entries. E.g.: before - [ENTRY_1][ENTRY_2],
after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed to [COMPRESSED
ENTRY_1 and ENTRY_2]).
1.2) Or it could be per-field - i.e. we compress fields, but respect binary
object layout. First approach is simple, straightforward, and will give
acceptable compression rate, but we will have to compress the whole binary
object on every field access, what may ruin our SQL performance. Second
approach is more complex, we are not sure about it's compression rate, but
as BinaryObject structure is preserved, we will still have fast
constant-time per-field access.

Please share your thoughts.

Vladimir.


On Sat, Aug 12, 2017 at 4:08 AM, Dmitriy Setrakyan <dsetrak...@apache.org>
wrote:

> I still don't understand per-partition compression, which is only local.
> How do entries get updated in the middle of a partition file?
>
> D.
>
> On Fri, Aug 11, 2017 at 3:44 AM, Yakov Zhdanov <yzhda...@apache.org>
> wrote:
>
> > Ivan, your points definitely make sense, however, I see at list one issue
> > compared to per-page approach. With per-page we can split page to
> > compressed and uncompressed regions and change the dictionary if more
> > efficient compression is possible. With bigger areas such as partition or
> > entire cache dynamic dictionary change may be very complex.
> >
> > --Yakov
> >
>

Reply via email to