Hi Anton, Do you have suggestions for this approach?
Sincerely, Dmitriy Pavlov пн, 26 мар. 2018 г. в 19:46, Anton Vinogradov <a...@apache.org>: > Can we use another approach to store compressed pages? > > 2018-03-26 19:06 GMT+03:00 Dmitry Pavlov <dpavlov....@gmail.com>: > > > +1 to Alexey's concern. No reason to compress if we use previous offset > as > > pageIdx*pageSize. > > > > пн, 26 мар. 2018 г. в 18:59, Alexey Goncharuk < > alexey.goncha...@gmail.com > > >: > > > > > Guys, > > > > > > How does this fit the PageMemory concept? Currently it assumes that the > > > size of the page in memory and the size of the page on disk is the > same, > > so > > > only per-entry level compression within a page makes sense. > > > > > > If you compress a whole page, how do you calculate the page offset in > the > > > target data file? > > > > > > --AG > > > > > > 2018-03-26 17:39 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>: > > > > > > > Gents, > > > > > > > > If I understood the idea correctly, the proposal is to compress pages > > on > > > > eviction and decompress them on read from disk. Is it correct? > > > > > > > > On Mon, Mar 26, 2018 at 5:13 PM, Anton Vinogradov <a...@apache.org> > > wrote: > > > > > > > > > + 1 to Taras's vision. > > > > > > > > > > Compression on eviction is a good case to store more. > > > > > Pages at memory always hot a real system, so complession in memory > > will > > > > > definetely slowdown the system, I think. > > > > > > > > > > Anyway, we can split issue to "on eviction compression" and to > > > "in-memory > > > > > compression". > > > > > > > > > > > > > > > 2018-03-06 12:14 GMT+03:00 Taras Ledkov <tled...@gridgain.com>: > > > > > > > > > > > Hi, > > > > > > > > > > > > I guess page level compression make sense on page loading / > > eviction. > > > > > > In this case we can decrease I/O operation and performance boost > > can > > > be > > > > > > reached. > > > > > > What is goal for in-memory compression? Holds about 2-5x data in > > > memory > > > > > > with performance drop? > > > > > > > > > > > > Also please clarify the case with compression/decompression for > hot > > > and > > > > > > cold pages. > > > > > > Is it right for your approach: > > > > > > 1. Hot pages are always decompressed in memory because many > > > read/write > > > > > > operations touch ones. > > > > > > 2. So we can compress only cold pages. > > > > > > > > > > > > So the way is suitable when the hot data size << available RAM > > size. > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > > > > > On 05.03.2018 20:18, Vyacheslav Daradur wrote: > > > > > > > > > > > >> Hi Igniters! > > > > > >> > > > > > >> I’d like to do next step in our data compression discussion [1]. > > > > > >> > > > > > >> Most Igniters vote for per-data-page compression. > > > > > >> > > > > > >> I’d like to accumulate main theses to start implementation: > > > > > >> - page will be compressed with the dictionary-based approach > > > (e.g.LZV) > > > > > >> - page will be compressed in batch mode (not on every change) > > > > > >> - page compression should been initiated by an event, for > > example, a > > > > > >> page’s free space drops below 20% > > > > > >> - compression process will be under page write lock > > > > > >> > > > > > >> Vladimir Ozerov has written: > > > > > >> > > > > > >>> What we do not understand yet: > > > > > >>>> 1) Granularity of compression algorithm. > > > > > >>>> 1.1) It could be per-entry - i.e. we compress the whole entry > > > > content, > > > > > >>>> but > > > > > >>>> respect boundaries between entries. E.g.: before - > > > > [ENTRY_1][ENTRY_2], > > > > > >>>> after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed > to > > > > > >>>> [COMPRESSED ENTRY_1 and ENTRY_2]). > > > > > >>>> v1.2) Or it could be per-field - i.e. we compress fields, but > > > > respect > > > > > >>>> binary > > > > > >>>> object layout. First approach is simple, straightforward, and > > will > > > > > give > > > > > >>>> acceptable compression rate, but we will have to compress the > > > whole > > > > > >>>> binary > > > > > >>>> object on every field access, what may ruin our SQL > performance. > > > > > Second > > > > > >>>> approach is more complex, we are not sure about it's > compression > > > > rate, > > > > > >>>> but > > > > > >>>> as BinaryObject structure is preserved, we will still have > fast > > > > > >>>> constant-time per-field access. > > > > > >>>> > > > > > >>> I think there are advantages in both approaches and we will be > > able > > > > to > > > > > >> compare different approaches and algorithms after prototype > > > > > >> implementation. > > > > > >> > > > > > >> Main approach in brief: > > > > > >> 1) When page’s free space drops below 20% will be triggered > > > > compression > > > > > >> event > > > > > >> 2) Page will be locked by write lock > > > > > >> 3) Page will be passed to page’s compressor implementation > > > > > >> 4) Page will be replaced by compressed page > > > > > >> > > > > > >> Whole object or a field reading: > > > > > >> 1) If page marked as compressed then the page will be handled by > > > > > >> page’s compressor implementation, otherwise, it will be handled > as > > > > > >> usual. > > > > > >> > > > > > >> Thoughts? > > > > > >> > > > > > >> Should we create new IEP and register tickets to start > > > implementation? > > > > > >> This will allow us to watch for the feature progress and related > > > > > >> tasks. > > > > > >> > > > > > >> > > > > > >> [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data- > > > > > >> compression-in-Ignite-tc20679.html > > > > > >> > > > > > >> > > > > > >> > > > > > > -- > > > > > > Taras Ledkov > > > > > > Mail-To: tled...@gridgain.com > > > > > > > > > > > > > > > > > > > > > > > > > > >