Some more comments: 1. I see advantage in both approaches, any chance we can support both of them? 2. We have to make sure that it is possible to enable different types of compression on per-cache level. 3. Would also be nice to show the compression ratio, i.e. how much data space was saved, as part of monitoring.
D. On Wed, Aug 23, 2017 at 6:12 AM, Vladimir Ozerov <[email protected]> wrote: > Igniters, > > This is to inform you that we had another conversation about compression > design, where Ivan's proposal about per-partition compression was > discussed. Let me share our current vision. > > 1) Per-partition approach will have higher compression rate, than per-page > approach. This is beyond doubts. > 2) But it is much more difficult to implement comparing to per-page. In > particular, dictionary management. Moreover, write operations might suffer. > > As a result we think that per-page approach is the way to go as a first > iteration. In future we may consider per-partition compression as a kind of > "one-time" batch operation which could be performed on rarely updated and > historical data. Tiered approach. Mature commercial vendors work in exactly > the same way. Sergey Puchnin and I are trying to better understand on how > exactly compression could be implemented. > > What we understand now: > 1) This will be dictionary-based compression (e.g. LZV) > 2) Page will be compressed in batch mode. I.e. not on every change, but > when certain threshold is reached (e.g. page's free space drops below 20%) > > What we do not understand yet: > 1) Granularity of compression algorithm. > 1.1) It could be per-entry - i.e. we compress the whole entry content, but > respect boundaries between entries. E.g.: before - [ENTRY_1][ENTRY_2], > after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed to > [COMPRESSED > ENTRY_1 and ENTRY_2]). > 1.2) Or it could be per-field - i.e. we compress fields, but respect binary > object layout. First approach is simple, straightforward, and will give > acceptable compression rate, but we will have to compress the whole binary > object on every field access, what may ruin our SQL performance. Second > approach is more complex, we are not sure about it's compression rate, but > as BinaryObject structure is preserved, we will still have fast > constant-time per-field access. > > Please share your thoughts. > > Vladimir. > > > On Sat, Aug 12, 2017 at 4:08 AM, Dmitriy Setrakyan <[email protected]> > wrote: > > > I still don't understand per-partition compression, which is only local. > > How do entries get updated in the middle of a partition file? > > > > D. > > > > On Fri, Aug 11, 2017 at 3:44 AM, Yakov Zhdanov <[email protected]> > > wrote: > > > > > Ivan, your points definitely make sense, however, I see at list one > issue > > > compared to per-page approach. With per-page we can split page to > > > compressed and uncompressed regions and change the dictionary if more > > > efficient compression is possible. With bigger areas such as partition > or > > > entire cache dynamic dictionary change may be very complex. > > > > > > --Yakov > > > > > >
