Re: [IMPORTANT] Future of Binary Objects

Pavel Tupitsyn Wed, 21 Nov 2018 04:22:13 -0800

Vladimir,

IMO the issue is that we allow any type of data in the cache (put Person,
then put int to the same cache).
Are we going to address this in 3.0 and enforce key/value types according
to cache configuration?
This will provide more space for optimizations.


On Wed, Nov 21, 2018 at 3:14 PM Vladimir Ozerov <[email protected]>
wrote:

> Denis,
>
> In theory data conversion could be avoided in certain cases. E.g. consider
> a case of loading data through streamer. We know the cache, we know it's
> metadata and row format. So instead of doing "user object" -> "binary
> object" -> "row", we can do "user object" -> "row".
>
> On Wed, Nov 21, 2018 at 1:31 PM Denis Mekhanikov <[email protected]>
> wrote:
>
> > Vladimir,
> >
> > Thank you for the clarification. I didn't see this distinction first.
> >
> > I meant using customizable formats for all serialization, not only for
> > storage.
> > The idea behind my proposal is to avoid data conversion, when loading
> data
> > into Ignite.
> > It will complicate usage of thin clients though, so I'm not sure, that it
> > will make users happier.
> >
> > But anyway, the same approach may be used for storage only.
> >
> > Denis
> >
> > ср, 21 нояб. 2018 г. в 12:57, Vladimir Ozerov <[email protected]>:
> >
> > > Denis,
> > >
> > > Could you please clarify - are you talking about storage, e.g. how
> > objects
> > > are stored in Ignite, or about serialization as a whole? I'd like to
> > better
> > > understand whether the use case you described is relevant to my idea of
> > > splitting binary objects from underlying storage format.
> > > My vision was that we can use current BinaryObject protocol (with
> > whatever
> > > optimizations needed), as a common format for communication between
> nodes
> > > and a common serialization protocol. This is very handy because all
> > > participants (Java, С++, .NET, all sorts of thin clients) are able to
> > work
> > > with it. So if I have a "Person" class in Java I can read it in any
> other
> > > platform without any additional configuration. But when it comes to
> > > *storage*, then we may introduce pluggable row format interface which
> > will
> > > apply any necessary transformations. So if someone wants to store
> objects
> > > in Avro/Protobuf, and ready to configure and implement it (generate
> > > classes, implementa field extraction logic, etc.) - then just implement
> > > that interface. They key is that this implementation will only be
> needed
> > in
> > > Java, not in a dozen of platform we support.
> > >
> > > But when it comes to how to store object in a cache
> > >
> > > On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov <
> [email protected]
> > >
> > > wrote:
> > >
> > > > People often ask about possibility to store their data in that
> format,
> > > that
> > > > they use in their applications.
> > > > If you use Avro everywhere in your application, then why not store
> data
> > > in
> > > > the same format in Ignite?
> > > > So, how about making an interface, that would enlist all operations
> we
> > > > need,
> > > > and use this interface everywhere without relying on any specific
> > > > implementation.
> > > > *BinaryObject* looks like a suitable interface, but the only
> > > > implementation, that you can get from Ignite
> > > > is *BinaryObjectImpl*.
> > > > I think, we should make Ignite extendible and provide capability to
> > > specify
> > > > your own data format
> > > > by implementing the corresponding interfaces.
> > > > So, if you like JSONB or Protobuf or whatever else, you could enable
> a
> > > > module for the corresponding
> > > > format, and use it for storing the data.
> > > >
> > > > Denis
> > > >
> > > > ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev <
> [email protected]
> > >:
> > > >
> > > > > I'd like @Vyacheslav Daradur approach.
> > > > >
> > > > > Maybe somebody could have a look at UnsafeRow in Spark
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> > > > > UnsafeRow is a concrete InternalRow that represents a mutable
> > internal
> > > > > raw-memory (and hence unsafe) binary row format.
> > > > >
> > > > > P.S. If somebody is interested in this apporach, I could share more
> > > > > information
> > > > >
> > > > > вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <
> > [email protected]
> > > >:
> > > > >
> > > > > > I really like Protobuf format. It is probably not what we need
> for
> > > O(1)
> > > > > > fields access,
> > > > > > but for compact data representation we can derive lots from
> there.
> > > > > >
> > > > > > Also IMO, restricting field type change is absolutely sane idea.
> > > > > > The correct way to evolve schema in common case is to add new
> > fields
> > > > and
> > > > > > gradually
> > > > > > deprecate the old ones, if you can skip default/null fields in
> > binary
> > > > > > format this approach
> > > > > > will not introduce any noticeable performance/size overhead.
> > > > > >
> > > > > > Sergi
> > > > > >
> > > > > > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <
> > > [email protected]
> > > > >:
> > > > > >
> > > > > > > I think, one of a possible way to reduce overhead and TCO - SQL
> > > > Scheme
> > > > > > > approach.
> > > > > > >
> > > > > > > That assumes that metadata will be stored separately from
> > > serialized
> > > > > > > data to reduce size.
> > > > > > > In this case, the most advantages of Binary Objects like access
> > in
> > > > > > > O(1) and access without deserialization may be achieved.
> > > > > > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <
> > > > [email protected]
> > > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi Alexey,
> > > > > > > >
> > > > > > > > Binary Objects only.
> > > > > > > >
> > > > > > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > > > > > [email protected]
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Do we discuss here Core features only or the roadmap for
> all
> > > > > > > components?
> > > > > > > > >
> > > > > > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> > > > > [email protected]
> > > > > > >:
> > > > > > > > >
> > > > > > > > > > Igniters,
> > > > > > > > > >
> > > > > > > > > > It is very likely that Apache Ignite 3.0 will be released
> > > next
> > > > > > year.
> > > > > > > So
> > > > > > > > > we
> > > > > > > > > > need to start thinking about major product improvements.
> > I'd
> > > > like
> > > > > > to
> > > > > > > > > start
> > > > > > > > > > with binary objects.
> > > > > > > > > >
> > > > > > > > > > Currently they are one of the main limiting factors for
> the
> > > > > > product.
> > > > > > > They
> > > > > > > > > > are fat - 30+ bytes overhead on average, high TCO of
> Apache
> > > > > Ignite
> > > > > > > > > > comparing to other vendors. They are slow - not suitable
> > for
> > > > SQL
> > > > > at
> > > > > > > all.
> > > > > > > > > >
> > > > > > > > > > I would like to ask all of you who worked with binary
> > objects
> > > > to
> > > > > > > share
> > > > > > > > > your
> > > > > > > > > > feedback and ideas, so that we understand how they should
> > > look
> > > > > like
> > > > > > > in AI
> > > > > > > > > > 3.0. This is a brain storm - let's accumulate ideas first
> > and
> > > > > > > minimize
> > > > > > > > > > critics. Then we will work on ideas in separate topics.
> > > > > > > > > >
> > > > > > > > > > 1) Historical background
> > > > > > > > > >
> > > > > > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when
> we
> > > > > started
> > > > > > > > > working
> > > > > > > > > > on .NET and CPP clients. During design we had several
> ideas
> > > in
> > > > > > mind:
> > > > > > > > > > - ability to read object fields in O(1) without
> > > deserialization
> > > > > > > > > > - interoperabillty between Java, .NET and CPP.
> > > > > > > > > >
> > > > > > > > > > Since then a number of other concepts were mixed to the
> > > > cocktail:
> > > > > > > > > > - Affinity key fields
> > > > > > > > > > - Strict typing for existing fields (aka metadata)
> > > > > > > > > > - Binary Object as storage format
> > > > > > > > > >
> > > > > > > > > > 2) My proposals
> > > > > > > > > >
> > > > > > > > > > 2.1) Introduce "Data Row Format" interface
> > > > > > > > > > Binary Objects are terrible candidates for storage. Too
> > fat,
> > > > too
> > > > > > > slow.
> > > > > > > > > > Efficient storage typically has <10 bytes overhead per
> row
> > > (no
> > > > > > > metadata,
> > > > > > > > > no
> > > > > > > > > > length, no hash code, etc), allow supper-fast field
> access,
> > > > > support
> > > > > > > > > > different string formats (ASCII, UTF-8, etc), support
> > > different
> > > > > > > temporal
> > > > > > > > > > types (date, time, timestamp, timestamp with timezone,
> > etc),
> > > > and
> > > > > > > store
> > > > > > > > > > these types as efficiently as possible.
> > > > > > > > > >
> > > > > > > > > > What we need is to introduce an interface which will
> > convert
> > > a
> > > > > pair
> > > > > > > of
> > > > > > > > > > key-value objects into a row. This row will be used to
> > store
> > > > data
> > > > > > > and to
> > > > > > > > > > get fields from it. Care about memory consumption, need
> SQL
> > > and
> > > > > > > strict
> > > > > > > > > > schema - use one format. Need flexibility and prefer
> > > key-value
> > > > > > > access -
> > > > > > > > > use
> > > > > > > > > > another format which will store binary objects unchanged
> > > > (current
> > > > > > > > > > behavior).
> > > > > > > > > >
> > > > > > > > > > interface DataRowFormat {
> > > > > > > > > >     DataRow create(Object key, Object value); //
> primitives
> > > or
> > > > > > binary
> > > > > > > > > > objects
> > > > > > > > > >     DataRowMetadata metadata();
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > 2.2) Remove affinity field from metadata
> > > > > > > > > > Affinity rules are governed by cache, not type. We should
> > > > remove
> > > > > > > > > > "affintiyFieldName" from metadata.
> > > > > > > > > >
> > > > > > > > > > 2.3) Remove restrictions on changing field type
> > > > > > > > > > I do not know why we did that in the first place. This
> > > > > restriction
> > > > > > > > > prevents
> > > > > > > > > > type evolution and confuses users.
> > > > > > > > > >
> > > > > > > > > > 2.4) Use bitmaps for "null" and default values and for
> > > > > fixed-length
> > > > > > > > > fields,
> > > > > > > > > > put fixed-length fields before variable-length.
> > > > > > > > > > Motivation: to save space.
> > > > > > > > > >
> > > > > > > > > > What else? Please share your ideas.
> > > > > > > > > >
> > > > > > > > > > Vladimir.
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best Regards, Vyacheslav D.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [IMPORTANT] Future of Binary Objects

Reply via email to