Since PDS is strongly depending on memory page's size I'd like to compress serialized data inside page exclude page header.
On Mon, Mar 26, 2018 at 7:49 PM, Vladimir Ozerov <voze...@gridgain.com> wrote: > Alex, > > In fact there are many approaches to this. Some vendors decided stick to > page - page is filled with data and then compressed when certain threshold > is reached (e.g. page is full or filled up to X%). Another approach is to > store data in memory in *larger blocks* than on the disk, and when it comes > to flush, one may try to compress it. If final size is lower than disk > block size then compression is considered successfull and data is saved in > compressed form. Otherwise data is saved as is. > > Both approaches may work, but IMO compression within a single block is > better and simpler to implement. > > On Mon, Mar 26, 2018 at 6:53 PM, Alexey Goncharuk < > alexey.goncha...@gmail.com> wrote: > >> Guys, >> >> How does this fit the PageMemory concept? Currently it assumes that the >> size of the page in memory and the size of the page on disk is the same, so >> only per-entry level compression within a page makes sense. >> >> If you compress a whole page, how do you calculate the page offset in the >> target data file? >> >> --AG >> >> 2018-03-26 17:39 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>: >> >> > Gents, >> > >> > If I understood the idea correctly, the proposal is to compress pages on >> > eviction and decompress them on read from disk. Is it correct? >> > >> > On Mon, Mar 26, 2018 at 5:13 PM, Anton Vinogradov <a...@apache.org> wrote: >> > >> > > + 1 to Taras's vision. >> > > >> > > Compression on eviction is a good case to store more. >> > > Pages at memory always hot a real system, so complession in memory will >> > > definetely slowdown the system, I think. >> > > >> > > Anyway, we can split issue to "on eviction compression" and to >> "in-memory >> > > compression". >> > > >> > > >> > > 2018-03-06 12:14 GMT+03:00 Taras Ledkov <tled...@gridgain.com>: >> > > >> > > > Hi, >> > > > >> > > > I guess page level compression make sense on page loading / eviction. >> > > > In this case we can decrease I/O operation and performance boost can >> be >> > > > reached. >> > > > What is goal for in-memory compression? Holds about 2-5x data in >> memory >> > > > with performance drop? >> > > > >> > > > Also please clarify the case with compression/decompression for hot >> and >> > > > cold pages. >> > > > Is it right for your approach: >> > > > 1. Hot pages are always decompressed in memory because many >> read/write >> > > > operations touch ones. >> > > > 2. So we can compress only cold pages. >> > > > >> > > > So the way is suitable when the hot data size << available RAM size. >> > > > >> > > > Thoughts? >> > > > >> > > > >> > > > On 05.03.2018 20:18, Vyacheslav Daradur wrote: >> > > > >> > > >> Hi Igniters! >> > > >> >> > > >> I’d like to do next step in our data compression discussion [1]. >> > > >> >> > > >> Most Igniters vote for per-data-page compression. >> > > >> >> > > >> I’d like to accumulate main theses to start implementation: >> > > >> - page will be compressed with the dictionary-based approach >> (e.g.LZV) >> > > >> - page will be compressed in batch mode (not on every change) >> > > >> - page compression should been initiated by an event, for example, a >> > > >> page’s free space drops below 20% >> > > >> - compression process will be under page write lock >> > > >> >> > > >> Vladimir Ozerov has written: >> > > >> >> > > >>> What we do not understand yet: >> > > >>>> 1) Granularity of compression algorithm. >> > > >>>> 1.1) It could be per-entry - i.e. we compress the whole entry >> > content, >> > > >>>> but >> > > >>>> respect boundaries between entries. E.g.: before - >> > [ENTRY_1][ENTRY_2], >> > > >>>> after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed to >> > > >>>> [COMPRESSED ENTRY_1 and ENTRY_2]). >> > > >>>> v1.2) Or it could be per-field - i.e. we compress fields, but >> > respect >> > > >>>> binary >> > > >>>> object layout. First approach is simple, straightforward, and will >> > > give >> > > >>>> acceptable compression rate, but we will have to compress the >> whole >> > > >>>> binary >> > > >>>> object on every field access, what may ruin our SQL performance. >> > > Second >> > > >>>> approach is more complex, we are not sure about it's compression >> > rate, >> > > >>>> but >> > > >>>> as BinaryObject structure is preserved, we will still have fast >> > > >>>> constant-time per-field access. >> > > >>>> >> > > >>> I think there are advantages in both approaches and we will be able >> > to >> > > >> compare different approaches and algorithms after prototype >> > > >> implementation. >> > > >> >> > > >> Main approach in brief: >> > > >> 1) When page’s free space drops below 20% will be triggered >> > compression >> > > >> event >> > > >> 2) Page will be locked by write lock >> > > >> 3) Page will be passed to page’s compressor implementation >> > > >> 4) Page will be replaced by compressed page >> > > >> >> > > >> Whole object or a field reading: >> > > >> 1) If page marked as compressed then the page will be handled by >> > > >> page’s compressor implementation, otherwise, it will be handled as >> > > >> usual. >> > > >> >> > > >> Thoughts? >> > > >> >> > > >> Should we create new IEP and register tickets to start >> implementation? >> > > >> This will allow us to watch for the feature progress and related >> > > >> tasks. >> > > >> >> > > >> >> > > >> [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data- >> > > >> compression-in-Ignite-tc20679.html >> > > >> >> > > >> >> > > >> >> > > > -- >> > > > Taras Ledkov >> > > > Mail-To: tled...@gridgain.com >> > > > >> > > > >> > > >> > >> -- Best Regards, Vyacheslav D.