Hello! That would be nice! My preferred compression method is zstd (it also has dictionary generation built in).
Regards, -- Ilya Kasnacheev пн, 25 мая 2020 г. в 13:25, Hostettler, Steve < steve.hostett...@wolterskluwer.com>: > I like the idea, especially because it also would apply across the board. > So you propose to build the binary object and to apply dictionary based > compression on top. > > I could quickly generate a bunch of binary objects from the tests and > apply java compress/deflate with a dictionary based on the BinaryUtils > elements. > To compare with the null compaction and the varint. > > > -----Original Message----- > From: Ilya Kasnacheev <ilya.kasnach...@gmail.com> > Sent: Monday, May 25, 2020 12:05 PM > To: dev <dev@ignite.apache.org> > Subject: Re: IGNITE-6499 Compact NULL fields > > Caution, this email may be from a sender outside Wolters Kluwer. Verify > the sender and know the content is safe. > > Hello! > > My take is the following: if conserving memory is needed at all, then we > better invest in compression (such as dictionary-based row compression) > rather than implementing varint, compact nulls, etc. > > Dictionary-based compression can easily tackle varints, null patterns > while also compressing strings and repeated values and even things we would > never think out on our own. > > It also has low complexity of our own code, no compatibility issues > (people store binary objects in 3rd party storage, they do indeed) and low > incidence of bugs. > > Regards, > -- > Ilya Kasnacheev > > > пн, 25 мая 2020 г. в 12:51, Hostettler, Steve < > steve.hostett...@wolterskluwer.com>: > > > I went for a simpler approach (only with null mask( and yes the gain > > is high for smaller object but low otherwise. I gain between 5-20% on > > my objects. But to me it is the step stone to easily implement other > > optimisations like varint and schemaless without using raw. Trying to > > solve the latest unit tests to give you a better idea. If not worth > > then let's not do it but it is worth a try I think. > > > > > > -----Original Message----- > > From: Ilya Kasnacheev <ilya.kasnach...@gmail.com> > > Sent: Monday, May 25, 2020 11:48 AM > > To: dev <dev@ignite.apache.org> > > Subject: Re: IGNITE-6499 Compact NULL fields > > > > Caution, this email may be from a sender outside Wolters Kluwer. > > Verify the sender and know the content is safe. > > > > Hello! > > > > I can't help myself but wonder how large of a benefit will it give. I > > have checked the ticket description, it looks the proposed scheme is > > elaborate and benefit for non-extreme binary objects rather tiny. > > > > WDYT? > > > > Regards, > > -- > > Ilya Kasnacheev > > > > > > пн, 18 мая 2020 г. в 22:54, steve.hostett...@gmail.com < > > steve.hostett...@gmail.com>: > > > > > Hello igniters, > > > > > > while I would like to help on the calcite because H2 optimiser (or > > > the lack > > > thereof) is really killing us, I think that it would be wiser to > > > start by contributing on something easier. > > > > > > Therefore I will tackle another problem that we have which is the > > > memory consumption. I stumbled upon this IEP > > > > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw > > > ik > > > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FIEP-2%253A%2BBinary%2 > > > Bo > > > bject%2Bformat%2Bimprovements&data=02%7C01%7CSteve.Hostettler%40 > > > wo > > > lterskluwer.com%7C7568148487434617407b08d80090b1f2%7C8ac76c91e7f141f > > > fa > > > 89c3553b2da2c17%7C0%7C0%7C637259968758509764&sdata=ZNFJ5gqEXRv5K > > > R3 > > > HJUfYZ4rmnGwCiFVGg4IrWTORT2k%3D&reserved=0 > > > < > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw > > > ik > > > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FIEP-2%253A%2BBinary%2 > > > Bo > > > bject%2Bformat%2Bimprovements&data=02%7C01%7CSteve.Hostettler%40 > > > wo > > > lterskluwer.com%7C7568148487434617407b08d80090b1f2%7C8ac76c91e7f141f > > > fa > > > 89c3553b2da2c17%7C0%7C0%7C637259968758509764&sdata=ZNFJ5gqEXRv5K > > > R3 HJUfYZ4rmnGwCiFVGg4IrWTORT2k%3D&reserved=0> > > > > > > that is about optimising the binary marshaller. > > > > > > The low hanging fruit seemed to be the null compaction so I decided > > > to start with it. Though I am sure I do see some hidden complexity. > > > > > > Here a couple of questions: > > > - Can I assign myself IGNITE-6499 and attach a patch? > > > - Who can I contact to help with the review. In the following page > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw > > > ik > > > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FHow%2Bto%2BContribute > > > &a > > > mp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C75681484874 > > > 34 > > > 617407b08d80090b1f2%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637 > > > 25 > > > 9968758519763&sdata=1Uzz8DLO%2B9nd1FPQ14efFeL35QsYE6tT3BvhIKf03F > > > Q% > > > 3D&reserved=0 > > > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fc > > > wi > > > ki.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FHow%2Bto%2BContribut > > > e& > > > amp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C7568148487 > > > 43 > > > 4617407b08d80090b1f2%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C63 > > > 72 > > > 59968758519763&sdata=1Uzz8DLO%2B9nd1FPQ14efFeL35QsYE6tT3BvhIKf03 > > > FQ %3D&reserved=0> there is no one assigned for marshalling. > > > > > > On the details: > > > The compression is disabled by default as it is not compatible with > > > objects previously marshalled. > > > > > > My approach was to go a bit beyond the JIRA. No only do I remove the > > > indexes to null fields in the footer, I also remove the 0x65 in the > > > objects. I did not remove them fro the collections and arrays > > > because they are using absolute positioning. > > > > > > I gain between 5% to 20% depending of my test cases. Obviously the > > > smaller the object and the higher the number of nulls, the higher > > > the compression rate. > > > > > > Based on that I can quite easily add var int compression which is > > > IGNITE-6418 and should significantly increase the compression rate > > > with a lot of integers and longs when only using small numbers. > > > > > > Next step is to add JMH micro-benchmark to check the impact in terms > > > of performances. > > > > > > > > > Example on a simple object w/ null compaction > > > > > > Length=55 FooterPosition=50 > > > 0x67 // ValueType > > > 0x01 // FormatVersion > > > 0x2b 0x00 //Flags userType=true hasSchema=true offset=1 > > > compactFooter=true > > > 0x78 0x66 0xbe 0x44 //TypeId > > > 0xf9 0xcd 0x07 0x57 //Hashcode > > > 0x37 0x00 0x00 0x00 //Length > > > 0x3d 0xa8 0x15 0xe4 //SchemaId > > > 0x32 0x00 0x00 0x00 //Footer position = 50 > > > 0x03 0x01 0x00 0x00 0x00 0x03 0x01 0x00 0x00 0x00 0x09 0x03 0x00 > > > 0x00 > > > 0x00 > > > 0x61 0x62 0x63 0x09 0x03 0x00 0x00 0x00 0x61 0x62 0x63 Footer > > > length=5 > > > 0x18 0x1d 0x22 0x2a 0x47 > > > > > > and w/o null compaction > > > Length=60 FooterPosition=53 > > > 0x67 // ValueType > > > 0x01 // FormatVersion > > > 0x2b 0x00 //Flags userType=true hasSchema=true offset=1 > > > compactFooter=true > > > 0x78 0x66 0xbe 0x44 //TypeId > > > 0xa4 0x43 0x0e 0xf5 //Hashcode > > > 0x3c 0x00 0x00 0x00 //Length > > > 0x3d 0xa8 0x15 0xe4 //SchemaId > > > 0x35 0x00 0x00 0x00 //Footer position = 53 > > > 0x03 0x01 0x00 0x00 0x00 0x03 0x01 0x00 0x00 0x00 0x09 0x03 0x00 > > > 0x00 > > > 0x00 > > > 0x61 0x62 0x63 0x65 0x65 0x65 0x09 0x03 0x00 0x00 0x00 0x61 0x62 > > > 0x63 Footer length=7 > > > 0x18 0x1d 0x22 0x2a 0x2b 0x2c 0x2d > > > > > > > > > > > > > > > -- > > > Sent from: > > > > > https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapach > > e-ignite-developers.2346864.n4.nabble.com%2F&data=02%7C01%7CSteve. > > Hostettler%40wolterskluwer.com%7C4a067fbb24ee43da986308d8009325b7%7C8a > > c76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637259979282744761&sdata= > > jEkZk0ihvnuPO4Z60Uoh16ST%2Bw51mKHeAUl1EICF4eE%3D&reserved=0 > > > > > >