Nikita, That was my intention: "we may need to provide a better facility to inject user's logic here..."
Andrey, About compression, once again - DB2 is a row-based DB and they can compress :) On Wed, Jul 27, 2016 at 12:56 PM, Nikita Ivanov <nivano...@gmail.com> wrote: > Very good points indeed. I get the compression in Ignite question quite > often and Hana reference is a typical lead in. > > My personal opinion is still that in Ignite *specifically* the compression > is best left to the end-user. But we may need to provide a better facility > to inject user's logic here... > > -- > Nikita Ivanov > > > On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev <andrewkor...@hotmail.com> > wrote: > > > Dictionary compression requires some knowledge about data being > > compressed. For example, for numeric types a range of values must be > known > > so that the dictionary can be generated. For strings, the number of > unique > > values of the column is the key piece of input into the dictionary > > generation. > > SAP HANA is a column-based database system: it stores the fields of the > > data tuple individually using the best compression for the given data > type > > and the particular set of values. HANA has been specifically built as a > > general purpose database, rather than as an afterthought layer on top of > an > > already existing distributed cache. > > On the other hand, Ignite is a distributed cache implementation (a pretty > > good one!) that in general requires no schema and stores its data in the > > row-based fashion. Its current design doesn't land itself readily to the > > kind of optimizations HANA provides out of the box. > > For the curios types among us, the implementation details of HANA are > well > > documented in "In-memory Data Management", by Hasso Plattner & Alexander > > Zeier. > > Cheers > > Andrey > > _____________________________ > > From: Alexey Kuznetsov <akuznet...@gridgain.com<mailto: > > akuznet...@gridgain.com>> > > Sent: Tuesday, July 26, 2016 5:36 AM > > Subject: Re: Data compression in Ignite 2.0 > > To: <dev@ignite.apache.org<mailto:dev@ignite.apache.org>> > > > > > > Sergey Kozlov wrote: > > >> For approach 1: Put a large object into a partition cache will > > force to update > > the dictionary placed on replication cache. It may be time-expense > > operation. > > The dictionary will be built only once. And we could control what should > be > > put into dictionary, for example, we could check min and max size and > > decide - put value to dictionary or not. > > > > >> Approach 2-3 are make sense for rare cases as Sergi commented. > > But it is better at least have a possibility to plug user code for > > compression than not to have it at all. > > > > >> Also I see a danger of OOM if we've got high compression level and try > > to restore original value in memory. > > We could easily get OOM with many other operations right now without > > compression, I think it is not an issue, we could add a NOTE to > > documentation about such possibility. > > > > Andrey Kornev wrote: > > >> ... in general I think compression is a great data. The cleanest way > to > > achieve that would be to just make it possible to chain the > marshallers... > > I think it is also good idea. And looks like it could be used for > > compression with some sort of ZIP algorithm, but how to deal with > > compression by dictionary substitution? > > We need to build dictionary first. Any ideas? > > > > Nikita Ivanov wrote: > > >> SAP Hana does the compression by 1) compressing SQL parameters before > > execution... > > Looks interesting, but my initial point was about compression of cache > > data, not SQL queries. > > My idea was to make compression transparent for SQL engine when it will > > lookup for data. > > > > But idea of compressing SQL queries result looks very interesting, > because > > it is known fact, that SQL engine could consume quite a lot of heap for > > storing result sets. > > I think this should be discussed in separate thread. > > > > Just for you information, in first message I mentioned that DB2 has > > compression by dictionary and according to them it is possible to > > compress usual data to 50-80%. > > I have some experience with DB2 and can confirm this. > > > > -- > > Alexey Kuznetsov > -- Alexey Kuznetsov