Re: Data compression design proposal

Vyacheslav Daradur Mon, 26 Mar 2018 09:51:22 -0700

Since PDS is strongly depending on memory page's size I'd like to
compress serialized data inside page exclude page header.


On Mon, Mar 26, 2018 at 7:49 PM, Vladimir Ozerov <[email protected]> wrote:
> Alex,
>
> In fact there are many approaches to this. Some vendors decided stick to
> page - page is filled with data and then compressed when certain threshold
> is reached (e.g. page is full or filled up to X%). Another approach is to
> store data in memory in *larger blocks* than on the disk, and when it comes
> to flush, one may try to compress it. If final size is lower than disk
> block size then compression is considered successfull and data is saved in
> compressed form. Otherwise data is saved as is.
>
> Both approaches may work, but IMO compression within a single block is
> better and simpler to implement.
>
> On Mon, Mar 26, 2018 at 6:53 PM, Alexey Goncharuk <
> [email protected]> wrote:
>
>> Guys,
>>
>> How does this fit the PageMemory concept? Currently it assumes that the
>> size of the page in memory and the size of the page on disk is the same, so
>> only per-entry level compression within a page makes sense.
>>
>> If you compress a whole page, how do you calculate the page offset in the
>> target data file?
>>
>> --AG
>>
>> 2018-03-26 17:39 GMT+03:00 Vladimir Ozerov <[email protected]>:
>>
>> > Gents,
>> >
>> > If I understood the idea correctly, the proposal is to compress pages on
>> > eviction and decompress them on read from disk. Is it correct?
>> >
>> > On Mon, Mar 26, 2018 at 5:13 PM, Anton Vinogradov <[email protected]> wrote:
>> >
>> > > + 1 to Taras's vision.
>> > >
>> > > Compression on eviction is a good case to store more.
>> > > Pages at memory always hot a real system, so complession in memory will
>> > > definetely slowdown the system, I think.
>> > >
>> > > Anyway, we can split issue to "on eviction compression" and to
>> "in-memory
>> > > compression".
>> > >
>> > >
>> > > 2018-03-06 12:14 GMT+03:00 Taras Ledkov <[email protected]>:
>> > >
>> > > > Hi,
>> > > >
>> > > > I guess page level compression make sense on page loading / eviction.
>> > > > In this case we can decrease I/O operation and performance boost can
>> be
>> > > > reached.
>> > > > What is goal for in-memory compression? Holds about 2-5x data in
>> memory
>> > > > with performance drop?
>> > > >
>> > > > Also please clarify the case with compression/decompression for hot
>> and
>> > > > cold pages.
>> > > > Is it right for your approach:
>> > > > 1. Hot pages are always decompressed in memory because many
>> read/write
>> > > > operations touch ones.
>> > > > 2. So we can compress only cold pages.
>> > > >
>> > > > So the way is suitable when the hot data size << available RAM size.
>> > > >
>> > > > Thoughts?
>> > > >
>> > > >
>> > > > On 05.03.2018 20:18, Vyacheslav Daradur wrote:
>> > > >
>> > > >> Hi Igniters!
>> > > >>
>> > > >> I’d like to do next step in our data compression discussion [1].
>> > > >>
>> > > >> Most Igniters vote for per-data-page compression.
>> > > >>
>> > > >> I’d like to accumulate  main theses to start implementation:
>> > > >> - page will be compressed with the dictionary-based approach
>> (e.g.LZV)
>> > > >> - page will be compressed in batch mode (not on every change)
>> > > >> - page compression should been initiated by an event, for example, a
>> > > >> page’s free space drops below 20%
>> > > >> - compression process will be under page write lock
>> > > >>
>> > > >> Vladimir Ozerov has written:
>> > > >>
>> > > >>> What we do not understand yet:
>> > > >>>> 1) Granularity of compression algorithm.
>> > > >>>> 1.1) It could be per-entry - i.e. we compress the whole entry
>> > content,
>> > > >>>> but
>> > > >>>> respect boundaries between entries. E.g.: before -
>> > [ENTRY_1][ENTRY_2],
>> > > >>>> after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed to
>> > > >>>> [COMPRESSED ENTRY_1 and ENTRY_2]).
>> > > >>>> v1.2) Or it could be per-field - i.e. we compress fields, but
>> > respect
>> > > >>>> binary
>> > > >>>> object layout. First approach is simple, straightforward, and will
>> > > give
>> > > >>>> acceptable compression rate, but we will have to compress the
>> whole
>> > > >>>> binary
>> > > >>>> object on every field access, what may ruin our SQL performance.
>> > > Second
>> > > >>>> approach is more complex, we are not sure about it's compression
>> > rate,
>> > > >>>> but
>> > > >>>> as BinaryObject structure is preserved, we will still have fast
>> > > >>>> constant-time per-field access.
>> > > >>>>
>> > > >>> I think there are advantages in both approaches and we will be able
>> > to
>> > > >> compare different approaches and algorithms after prototype
>> > > >> implementation.
>> > > >>
>> > > >> Main approach in brief:
>> > > >> 1) When page’s free space drops below 20% will be triggered
>> > compression
>> > > >> event
>> > > >> 2) Page will be locked by write lock
>> > > >> 3) Page will be passed to page’s compressor implementation
>> > > >> 4) Page will be replaced by compressed page
>> > > >>
>> > > >> Whole object or a field reading:
>> > > >> 1) If page marked as compressed then the page will be handled by
>> > > >> page’s compressor implementation, otherwise, it will be handled as
>> > > >> usual.
>> > > >>
>> > > >> Thoughts?
>> > > >>
>> > > >> Should we create new IEP and register tickets to start
>> implementation?
>> > > >> This will allow us to watch for the feature progress and related
>> > > >> tasks.
>> > > >>
>> > > >>
>> > > >> [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-
>> > > >> compression-in-Ignite-tc20679.html
>> > > >>
>> > > >>
>> > > >>
>> > > > --
>> > > > Taras Ledkov
>> > > > Mail-To: [email protected]
>> > > >
>> > > >
>> > >
>> >
>>



-- 
Best Regards, Vyacheslav D.

Re: Data compression design proposal

Reply via email to