Igniters, This is to inform you that we had another conversation about compression design, where Ivan's proposal about per-partition compression was discussed. Let me share our current vision.
1) Per-partition approach will have higher compression rate, than per-page approach. This is beyond doubts. 2) But it is much more difficult to implement comparing to per-page. In particular, dictionary management. Moreover, write operations might suffer. As a result we think that per-page approach is the way to go as a first iteration. In future we may consider per-partition compression as a kind of "one-time" batch operation which could be performed on rarely updated and historical data. Tiered approach. Mature commercial vendors work in exactly the same way. Sergey Puchnin and I are trying to better understand on how exactly compression could be implemented. What we understand now: 1) This will be dictionary-based compression (e.g. LZV) 2) Page will be compressed in batch mode. I.e. not on every change, but when certain threshold is reached (e.g. page's free space drops below 20%) What we do not understand yet: 1) Granularity of compression algorithm. 1.1) It could be per-entry - i.e. we compress the whole entry content, but respect boundaries between entries. E.g.: before - [ENTRY_1][ENTRY_2], after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed to [COMPRESSED ENTRY_1 and ENTRY_2]). 1.2) Or it could be per-field - i.e. we compress fields, but respect binary object layout. First approach is simple, straightforward, and will give acceptable compression rate, but we will have to compress the whole binary object on every field access, what may ruin our SQL performance. Second approach is more complex, we are not sure about it's compression rate, but as BinaryObject structure is preserved, we will still have fast constant-time per-field access. Please share your thoughts. Vladimir. On Sat, Aug 12, 2017 at 4:08 AM, Dmitriy Setrakyan <dsetrak...@apache.org> wrote: > I still don't understand per-partition compression, which is only local. > How do entries get updated in the middle of a partition file? > > D. > > On Fri, Aug 11, 2017 at 3:44 AM, Yakov Zhdanov <yzhda...@apache.org> > wrote: > > > Ivan, your points definitely make sense, however, I see at list one issue > > compared to per-page approach. With per-page we can split page to > > compressed and uncompressed regions and change the dictionary if more > > efficient compression is possible. With bigger areas such as partition or > > entire cache dynamic dictionary change may be very complex. > > > > --Yakov > > >