Re: [IMPORTANT] Future of Binary Objects

Andrey Mashenkov Wed, 21 Nov 2018 08:49:50 -0800

Hi,

Vladimir,  Ilya,


What about variable length fields? How do you suggest to store offsets in
footer or header?

For large objects, headers will allow to retrive field faster and detect
null immediately, but we have to reserve place for all var-len fields
offset and update header after serialization.
however, footers looks more compact (we can omit nulls) and allow us to use
stream concept during serialization.
Have I miss smth?


On Wed, Nov 21, 2018 at 7:18 PM Ilya Kasnacheev <[email protected]>
wrote:

> Hello!
>
> I would like to propose the following changes:
>
> - Let's allow multiple BinaryType's per Class. Make typeId = cksum(list of
> class types + fields) as opposed of cksum(class name) as we have it
> currently. Note that we only have to compute that once per class loaded in
> JVM.
> - BinaryType has a list of fixed length fields (numbers, datetimes, flags)
> and list of variable length fields. We can put all fixed length fields at
> start of BinaryObject so that we can access them by offset as per typeId.
> - Likewise we don't need to encode field id in BinaryObject anymore, save 4
> bytes per field. We already know their order from BinaryType.
> - This means when you ALTER TABLE we add a BinaryType to existing Class (or
> pseudo-Class type name) and we can use it for new data, and eventually
> update existing data to have this field.
> - On top of BinaryType's we can have checks that run them against SQL table
> columns list to see if there are any mismatches.
>
> To Illustrate, previously we had it like:
> [ Type id | String field id | String field value | Long field id | Long
> field value | Datetime field id | Datetime field value ]
> But now it will be
> [ Type id | Long field value | Datetime field value | String field value ]
>             ^------------------^---- can be accessed by offset
>
> Regards,
> Ilya.
>
> --
> Ilya Kasnacheev
>
>
> вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <[email protected]>:
>
> > Igniters,
> >
> > It is very likely that Apache Ignite 3.0 will be released next year. So
> we
> > need to start thinking about major product improvements. I'd like to
> start
> > with binary objects.
> >
> > Currently they are one of the main limiting factors for the product. They
> > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > comparing to other vendors. They are slow - not suitable for SQL at all.
> >
> > I would like to ask all of you who worked with binary objects to share
> your
> > feedback and ideas, so that we understand how they should look like in AI
> > 3.0. This is a brain storm - let's accumulate ideas first and minimize
> > critics. Then we will work on ideas in separate topics.
> >
> > 1) Historical background
> >
> > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> working
> > on .NET and CPP clients. During design we had several ideas in mind:
> > - ability to read object fields in O(1) without deserialization
> > - interoperabillty between Java, .NET and CPP.
> >
> > Since then a number of other concepts were mixed to the cocktail:
> > - Affinity key fields
> > - Strict typing for existing fields (aka metadata)
> > - Binary Object as storage format
> >
> > 2) My proposals
> >
> > 2.1) Introduce "Data Row Format" interface
> > Binary Objects are terrible candidates for storage. Too fat, too slow.
> > Efficient storage typically has <10 bytes overhead per row (no metadata,
> no
> > length, no hash code, etc), allow supper-fast field access, support
> > different string formats (ASCII, UTF-8, etc), support different temporal
> > types (date, time, timestamp, timestamp with timezone, etc), and store
> > these types as efficiently as possible.
> >
> > What we need is to introduce an interface which will convert a pair of
> > key-value objects into a row. This row will be used to store data and to
> > get fields from it. Care about memory consumption, need SQL and strict
> > schema - use one format. Need flexibility and prefer key-value access -
> use
> > another format which will store binary objects unchanged (current
> > behavior).
> >
> > interface DataRowFormat {
> >     DataRow create(Object key, Object value); // primitives or binary
> > objects
> >     DataRowMetadata metadata();
> > }
> >
> > 2.2) Remove affinity field from metadata
> > Affinity rules are governed by cache, not type. We should remove
> > "affintiyFieldName" from metadata.
> >
> > 2.3) Remove restrictions on changing field type
> > I do not know why we did that in the first place. This restriction
> prevents
> > type evolution and confuses users.
> >
> > 2.4) Use bitmaps for "null" and default values and for fixed-length
> fields,
> > put fixed-length fields before variable-length.
> > Motivation: to save space.
> >
> > What else? Please share your ideas.
> >
> > Vladimir.
> >
>


-- 
Best regards,
Andrey V. Mashenkov

Re: [IMPORTANT] Future of Binary Objects

Reply via email to