Re: [IMPORTANT] Future of Binary Objects

Sergi Vladykin Thu, 22 Nov 2018 01:25:08 -0800

If we are developing a product for users, we already guessing what is right
and what is wrong for them. So let's avoid these sophistic statements.


In the end it is always our responsibility to provide a balanced set of
trade-offs between
usability, performance and safety.

Let me repeat, I'm not against any possible type conversions, but I'm
strongly against binary incompatible ones.
If we always store List.of(1) as 1 and make them binary interchangeable,
I'm OK with that.

And still for good practices I'd suggest to look at what Protobuf allows
and what not:
https://developers.google.com/protocol-buffers/docs/proto3#updating

Sergi

чт, 22 нояб. 2018 г. в 11:04, Vladimir Ozerov <[email protected]>:

> Sergi,
>
> I think we should not guess for users what is right or wrong for them. It
> is up to user to decide what is valid. For example, consider a user who
> operates on a list of Integers, and to optimize memory consumption he
> decide to save in the same field either List<Integer>, or plain Integer in
> case only single element exists. Another example - a kind of data lake or
> data cleansing application, which may receive the same field in different
> forms. E.g. age in the form of Integer or String. Does it work for user or
> not? We do not know. Will he need to migrate the whole data set? We do not
> know either.
>
> The only place in the product where we case is SQL. But in this case
> instead of adding checks on binary level, we should validate data on cache
> level. In fact, Ignite already works this way. E.g. nullability checks are
> performed on cache level rather than binary. All we need is to move all
> checks to cache level from binary level.
>
>
> On Thu, Nov 22, 2018 at 9:41 AM Sergi Vladykin <[email protected]>
> wrote:
>
> > It may be OK to extend compatible field types (like from Int to Long).
> >
> > In Protobuf for example this is allowed just because there is no
> difference
> > between Int and Long in binary format: they all are equally varlen
> encoded
> > and Longs just will occupy up to 9 bytes, while Ints up to 5.
> >
> > But for every other case, where binary representation is type dependent,
> I
> > would be against. This will either require to migrate the whole dataset
> to
> > a new model (which is always risky, since you may need to rollback to
> > previous version of your code) or it will require type checks/conversions
> > for each field access, which is a hard to reason complication and
> possible
> > performance penalty.
> >
> > Sergi
> >
> >
> >
> > чт, 22 нояб. 2018 г. в 09:23, Vladimir Ozerov <[email protected]>:
> >
> > > Denis,
> > >
> > > Several examples:
> > > 1) DEFAULT values - in SQL you may avoid storing default value in the
> > table
> > > and store it in metadata instead. Not applicable for BinaryObject
> because
> > > the same binary object may be saved to two SQL tables with different
> > > defaults
> > > 2) DATE and other temporal types - in SQL you want to store it in
> special
> > > format to be able to extract date parts quickly (typically - 11 bytes).
> > But
> > > in Java and some other languages the best format is plain long. this is
> > why
> > > we use it BinaryObject
> > > 3) String charset - in SQL you may choose different charsets for
> > different
> > > tables. E.g. UTF-8 for one, ASCII for another. In BinaryObject we store
> > > everything in UTF-8, and this is fine for most cases, well ... except
> of
> > > SQL :-)
> > >
> > > The key thing here is that you cannot define a format which will be
> good
> > > for both SQL, and native API. They are very different. This is why I
> > > propose to define additional interface on cache level defining how to
> > store
> > > values, which will be very different from binary objects.
> > >
> > > Vladimir.
> > >
> > > On Thu, Nov 22, 2018 at 3:32 AM Denis Magda <[email protected]> wrote:
> > >
> > > > Vladimir,
> > > >
> > > > Could you educate me a little bit, why the current format is bad for
> > SQL
> > > > and why another one is more suitable?
> > > >
> > > > Also, if we introduce the new format then why would we keep the
> binary
> > > one?
> > > > Is the new format just a next version of the binary one.
> > > >
> > > > 2.3) Remove restrictions on changing field type
> > > > > I do not know why we did that in the first place. This restriction
> > > > prevents
> > > > > type evolution and confuses users.
> > > >
> > > >
> > > > That is a hot requirement shared by those who use Ignite SQL in
> > > production.
> > > > +1.
> > > >
> > > > --
> > > > Denis
> > > >
> > > > On Mon, Nov 19, 2018 at 11:05 PM Vladimir Ozerov <
> [email protected]
> > >
> > > > wrote:
> > > >
> > > > > Igniters,
> > > > >
> > > > > It is very likely that Apache Ignite 3.0 will be released next
> year.
> > So
> > > > we
> > > > > need to start thinking about major product improvements. I'd like
> to
> > > > start
> > > > > with binary objects.
> > > > >
> > > > > Currently they are one of the main limiting factors for the
> product.
> > > They
> > > > > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > > > > comparing to other vendors. They are slow - not suitable for SQL at
> > > all.
> > > > >
> > > > > I would like to ask all of you who worked with binary objects to
> > share
> > > > your
> > > > > feedback and ideas, so that we understand how they should look like
> > in
> > > AI
> > > > > 3.0. This is a brain storm - let's accumulate ideas first and
> > minimize
> > > > > critics. Then we will work on ideas in separate topics.
> > > > >
> > > > > 1) Historical background
> > > > >
> > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> > > > working
> > > > > on .NET and CPP clients. During design we had several ideas in
> mind:
> > > > > - ability to read object fields in O(1) without deserialization
> > > > > - interoperabillty between Java, .NET and CPP.
> > > > >
> > > > > Since then a number of other concepts were mixed to the cocktail:
> > > > > - Affinity key fields
> > > > > - Strict typing for existing fields (aka metadata)
> > > > > - Binary Object as storage format
> > > > >
> > > > > 2) My proposals
> > > > >
> > > > > 2.1) Introduce "Data Row Format" interface
> > > > > Binary Objects are terrible candidates for storage. Too fat, too
> > slow.
> > > > > Efficient storage typically has <10 bytes overhead per row (no
> > > metadata,
> > > > no
> > > > > length, no hash code, etc), allow supper-fast field access, support
> > > > > different string formats (ASCII, UTF-8, etc), support different
> > > temporal
> > > > > types (date, time, timestamp, timestamp with timezone, etc), and
> > store
> > > > > these types as efficiently as possible.
> > > > >
> > > > > What we need is to introduce an interface which will convert a pair
> > of
> > > > > key-value objects into a row. This row will be used to store data
> and
> > > to
> > > > > get fields from it. Care about memory consumption, need SQL and
> > strict
> > > > > schema - use one format. Need flexibility and prefer key-value
> > access -
> > > > use
> > > > > another format which will store binary objects unchanged (current
> > > > > behavior).
> > > > >
> > > > > interface DataRowFormat {
> > > > >     DataRow create(Object key, Object value); // primitives or
> binary
> > > > > objects
> > > > >     DataRowMetadata metadata();
> > > > > }
> > > > >
> > > > > 2.2) Remove affinity field from metadata
> > > > > Affinity rules are governed by cache, not type. We should remove
> > > > > "affintiyFieldName" from metadata.
> > > > >
> > > > > 2.3) Remove restrictions on changing field type
> > > > > I do not know why we did that in the first place. This restriction
> > > > prevents
> > > > > type evolution and confuses users.
> > > > >
> > > > > 2.4) Use bitmaps for "null" and default values and for fixed-length
> > > > fields,
> > > > > put fixed-length fields before variable-length.
> > > > > Motivation: to save space.
> > > > >
> > > > > What else? Please share your ideas.
> > > > >
> > > > > Vladimir.
> > > > >
> > > >
> > >
> >
>

Re: [IMPORTANT] Future of Binary Objects

Reply via email to