Hello!

That would be nice! My preferred compression method is zstd (it also has
dictionary generation built in).

Regards,
-- 
Ilya Kasnacheev


пн, 25 мая 2020 г. в 13:25, Hostettler, Steve <
steve.hostett...@wolterskluwer.com>:

> I like the idea, especially because it also would apply across the board.
> So you propose to build the binary object and to apply dictionary based
> compression on top.
>
> I could quickly generate a bunch of binary objects from the tests and
> apply java compress/deflate with a dictionary based on the BinaryUtils
> elements.
> To compare with the null compaction and the varint.
>
>
> -----Original Message-----
> From: Ilya Kasnacheev <ilya.kasnach...@gmail.com>
> Sent: Monday, May 25, 2020 12:05 PM
> To: dev <dev@ignite.apache.org>
> Subject: Re: IGNITE-6499 Compact NULL fields
>
> Caution, this email may be from a sender outside Wolters Kluwer. Verify
> the sender and know the content is safe.
>
> Hello!
>
> My take is the following: if conserving memory is needed at all, then we
> better invest in compression (such as dictionary-based row compression)
> rather than implementing varint, compact nulls, etc.
>
> Dictionary-based compression can easily tackle varints, null patterns
> while also compressing strings and repeated values and even things we would
> never think out on our own.
>
> It also has low complexity of our own code, no compatibility issues
> (people store binary objects in 3rd party storage, they do indeed) and low
> incidence of bugs.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 25 мая 2020 г. в 12:51, Hostettler, Steve <
> steve.hostett...@wolterskluwer.com>:
>
> > I went for a simpler approach (only with null mask( and yes the gain
> > is high for smaller object but low otherwise. I gain between 5-20% on
> > my objects. But to me it is the step stone to easily implement other
> > optimisations like varint and schemaless without using raw. Trying to
> > solve the latest unit tests to give you a better idea. If not worth
> > then let's not do it but it is worth a try I think.
> >
> >
> > -----Original Message-----
> > From: Ilya Kasnacheev <ilya.kasnach...@gmail.com>
> > Sent: Monday, May 25, 2020 11:48 AM
> > To: dev <dev@ignite.apache.org>
> > Subject: Re: IGNITE-6499 Compact NULL fields
> >
> > Caution, this email may be from a sender outside Wolters Kluwer.
> > Verify the sender and know the content is safe.
> >
> > Hello!
> >
> > I can't help myself but wonder how large of a benefit will it give.  I
> > have checked the ticket description, it looks the proposed scheme is
> > elaborate and benefit for non-extreme binary objects rather tiny.
> >
> > WDYT?
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > пн, 18 мая 2020 г. в 22:54, steve.hostett...@gmail.com <
> > steve.hostett...@gmail.com>:
> >
> > > Hello igniters,
> > >
> > > while I would like to help on the calcite because H2 optimiser (or
> > > the lack
> > > thereof) is really killing us, I think that it would be wiser to
> > > start by contributing on something easier.
> > >
> > > Therefore I will tackle another problem that we have which is the
> > > memory consumption. I stumbled upon this IEP
> > >
> > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw
> > > ik
> > > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FIEP-2%253A%2BBinary%2
> > > Bo
> > > bject%2Bformat%2Bimprovements&amp;data=02%7C01%7CSteve.Hostettler%40
> > > wo
> > > lterskluwer.com%7C7568148487434617407b08d80090b1f2%7C8ac76c91e7f141f
> > > fa
> > > 89c3553b2da2c17%7C0%7C0%7C637259968758509764&amp;sdata=ZNFJ5gqEXRv5K
> > > R3
> > > HJUfYZ4rmnGwCiFVGg4IrWTORT2k%3D&amp;reserved=0
> > > <
> > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw
> > > ik
> > > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FIEP-2%253A%2BBinary%2
> > > Bo
> > > bject%2Bformat%2Bimprovements&amp;data=02%7C01%7CSteve.Hostettler%40
> > > wo
> > > lterskluwer.com%7C7568148487434617407b08d80090b1f2%7C8ac76c91e7f141f
> > > fa
> > > 89c3553b2da2c17%7C0%7C0%7C637259968758509764&amp;sdata=ZNFJ5gqEXRv5K
> > > R3 HJUfYZ4rmnGwCiFVGg4IrWTORT2k%3D&amp;reserved=0>
> > >
> > > that is about optimising the binary marshaller.
> > >
> > > The low hanging fruit seemed to be the null compaction so I decided
> > > to start with it. Though I am sure I do see some hidden complexity.
> > >
> > > Here a couple of questions:
> > > - Can I assign myself IGNITE-6499 and attach a patch?
> > > - Who can I contact to help with the review. In the following page
> > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw
> > > ik
> > > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FHow%2Bto%2BContribute
> > > &a
> > > mp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C75681484874
> > > 34
> > > 617407b08d80090b1f2%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637
> > > 25
> > > 9968758519763&amp;sdata=1Uzz8DLO%2B9nd1FPQ14efFeL35QsYE6tT3BvhIKf03F
> > > Q%
> > > 3D&amp;reserved=0
> > > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fc
> > > wi
> > > ki.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FHow%2Bto%2BContribut
> > > e&
> > > amp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C7568148487
> > > 43
> > > 4617407b08d80090b1f2%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C63
> > > 72
> > > 59968758519763&amp;sdata=1Uzz8DLO%2B9nd1FPQ14efFeL35QsYE6tT3BvhIKf03
> > > FQ %3D&amp;reserved=0> there is no one assigned for marshalling.
> > >
> > > On the details:
> > > The compression is disabled by default as it is not compatible with
> > > objects previously marshalled.
> > >
> > > My approach was to go a bit beyond the JIRA. No only do I remove the
> > > indexes to null fields in the footer, I also remove the 0x65 in the
> > > objects. I did not remove them fro the collections and arrays
> > > because they are using absolute positioning.
> > >
> > > I gain between 5% to 20% depending of my test cases. Obviously the
> > > smaller the object and the higher the number of nulls, the higher
> > > the compression rate.
> > >
> > > Based on that I can quite easily add var int compression which is
> > > IGNITE-6418 and should significantly increase the compression rate
> > > with a lot of integers and longs when only using small numbers.
> > >
> > > Next step is to add JMH micro-benchmark to check the impact in terms
> > > of performances.
> > >
> > >
> > > Example on a simple object w/ null compaction
> > >
> > > Length=55 FooterPosition=50
> > > 0x67 // ValueType
> > > 0x01 // FormatVersion
> > > 0x2b 0x00 //Flags userType=true hasSchema=true offset=1
> > > compactFooter=true
> > > 0x78 0x66 0xbe 0x44 //TypeId
> > > 0xf9 0xcd 0x07 0x57 //Hashcode
> > > 0x37 0x00 0x00 0x00 //Length
> > > 0x3d 0xa8 0x15 0xe4 //SchemaId
> > > 0x32 0x00 0x00 0x00 //Footer position = 50
> > > 0x03 0x01 0x00 0x00 0x00 0x03 0x01 0x00 0x00 0x00 0x09 0x03 0x00
> > > 0x00
> > > 0x00
> > > 0x61 0x62 0x63 0x09 0x03 0x00 0x00 0x00 0x61 0x62 0x63 Footer
> > > length=5
> > > 0x18 0x1d 0x22 0x2a 0x47
> > >
> > > and w/o null compaction
> > > Length=60 FooterPosition=53
> > > 0x67 // ValueType
> > > 0x01 // FormatVersion
> > > 0x2b 0x00 //Flags userType=true hasSchema=true offset=1
> > > compactFooter=true
> > > 0x78 0x66 0xbe 0x44 //TypeId
> > > 0xa4 0x43 0x0e 0xf5 //Hashcode
> > > 0x3c 0x00 0x00 0x00 //Length
> > > 0x3d 0xa8 0x15 0xe4 //SchemaId
> > > 0x35 0x00 0x00 0x00 //Footer position = 53
> > > 0x03 0x01 0x00 0x00 0x00 0x03 0x01 0x00 0x00 0x00 0x09 0x03 0x00
> > > 0x00
> > > 0x00
> > > 0x61 0x62 0x63 0x65 0x65 0x65 0x09 0x03 0x00 0x00 0x00 0x61 0x62
> > > 0x63 Footer length=7
> > > 0x18 0x1d 0x22 0x2a 0x2b 0x2c 0x2d
> > >
> > >
> > >
> > >
> > > --
> > > Sent from:
> > >
> > https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapach
> > e-ignite-developers.2346864.n4.nabble.com%2F&amp;data=02%7C01%7CSteve.
> > Hostettler%40wolterskluwer.com%7C4a067fbb24ee43da986308d8009325b7%7C8a
> > c76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637259979282744761&amp;sdata=
> > jEkZk0ihvnuPO4Z60Uoh16ST%2Bw51mKHeAUl1EICF4eE%3D&amp;reserved=0
> > >
> >
>

Reply via email to