Re: Deserialization of BinaryObject's byte arrays

Vladimir Ozerov Tue, 07 Nov 2017 21:27:56 -0800

Andrey,

Special collection types support was our mistake when we designed binary
protocol in the first place.


As far as BinaryObject idea - makes sense. But I would avoid any kind of
implicit conversions. Special method for byte array should be enough.

вт, 7 нояб. 2017 г. в 22:01, Andrey Kornev <[email protected]>:

> Vladimir,
>
> I appreciate your taking the time to reply. Thank you for sharing your
> thoughts!
>
> I have to admit I'm still not convinced there is anything inherently wrong
> about having the Binary protocol support ByteBuffers as first class
> citizen. The Binary marshaller already supports specific collection/map
> types (such as linked list, array list, hash set etc.) which all, one
> could argue, are just "thin wrappers" around "fundamental" sequence/map
> abstractions. I don't see why ByteBuffer can't be treated the same way.
>
> Also, your approach, given a relatively simple problem we're trying to
> solve here, feels very much over-engineered. As an Ignite user, I'd like to
> have a *simple* way of deserializing primitive arrays efficiently. Having
> to implement a bunch of interfaces to do just that doesn't fit my (Ignite
> user's) definition of "simple". Contrary to your statement below, the APIs
> you define (or not define) do have a lot to do with user experience.
>
> In any case, here's a new proposal. And this time, it doesn't require
> modification of the Binary protocol.
>
> Essentially what we need is just a way to tell the Binary marshaller
> how array fields are to be deserialized. It can be done globally (for
> example, by introducing a new config property on BinaryConfiguration). Or,
> it can be done on per binary object instance basis. For example, we can add
> a new method to the BinaryObject class (e.g. BinaryObject.withArrayAsBuffer())
> to make subsequent calls to BinaryObject.field() return an instance of
> XxxBuffer class rather than a raw array (if the underlying field is of
> course an array).
>
> (Alternatively, we could enhance the BinaryObject class by adding:
>
> Buffer fieldAsBuffer(String fieldName);
>
> method in addition to the existing:
>
> T field(String fieldName);
>
> This would make it possible to read any field (including scalar
> primitive types and String's) as an instance of XxxBuffer that wraps the
> field's raw bytes in the BinaryObject's backing array.)
>
> Actually I personally like this alternative better -- it's more consistent
> and symmetric, and opens up opportunities for further optimization. For
> example, in some cases (such as equality checks) I don't need an
> actual instance of a String -- its character (or even byte) arrays are
> sufficient.
>
> The good thing is that neither approach requires the field to have
> been originally written (serialized) as XxxBuffer. It could've been just a
> regular array. The actual type (e.g. byte[] or ByteBuffer) only matters
> during deserialization.
>
> Regards
> Andrey
>
> ------------------------------
> *From:* Vladimir Ozerov <[email protected]>
> *Sent:* Tuesday, November 7, 2017 12:25 AM
> *To:* [email protected]
>
> *Subject:* Re: Deserialization of BinaryObject's byte arrays
> Regarding "sefl-imposed rules":
> 1) Binary protocol is essentially ignite's type system. It is used in many
> places: Java, .NET, CPP, SQL (native, JDBC, ODBC), thin client. Of those 5
> parties only Java has "ByteBuffer" concept. How other platforms should work
> with this type?
> 2) We compare objects using binary representation
> (BinaryArrayIdentityResolver#equals0). So if one user write data as byte
> array, and another write the same data as ByteBuffer, resulting objects
> will not be equal
> These are clear examples on why type system should be as small as possible
> - we simply cannot afford extending type system for every minor use case.
> Binary protocol is internal thing and has nothing to do with user
> experience.
>
> What is worse, ByteBuffer is terrible abstraction as recognized by many
> millions of Java users - it is stateful. One cannot read data from it
> without changing it's state. And Ignite gives no guarantees that particular
> object will be serialized only once per certain operation. One example is
> BinaryMetadataCollector - Ignite could go through object twice per
> serialization cycle to collect metadata. How are we going to maintain
> idempotence in general case?
>
> What could improve UX is public interfaces. Raw example of what we can do:
> 1) Define generic wrappers over our streams that will give users ability to
> read/write byte arrays as they want. E.g.:
> interface BinaryArrayWriter {
>     write(byte[] src, int offset, int len);
>     write(ByteBuffer src, ...);
> }
>
> interface BinaryArrayReader {
>     int readLength();
>     byte[] read();
>     void read(byte[] dest, int off, int len);
>     void read(ByteBuffer dest);
> }
>
> 2) Then add new methods to reader/writer/builder. E.g.:
> BinaryWriter.writeByteArray(String name, BiConsumer<Object,
> BinaryArrayWriter> consumer); // Consumes original field and array writer;
> BinaryReader.readByteArray(String name, Function<BinaryArrayReader>
> consumer);            // Takes array reader, produces original field
>
> 3) Last, introduce field annotation for user convenience:
> class MyClass {
>     @BinaryArray(writerClass = MyWriterBiConsumer.class, readerClass =
> MyReaderFunction.class)
>     private ByteBuffer buf;
> }
>
> Now all user need to do is to implement write consumer and read function.
>
> Vladimir.
>
>
> On Tue, Nov 7, 2017 at 10:16 AM, Vladimir Ozerov <[email protected]>
> wrote:
>
> > ByteBuffer is not fundamental data type. It is a thin wrapper over array.
> > Protocol should contain only fundamental types.
> >
> > вт, 7 нояб. 2017 г. в 10:05, Andrey Kornev <[email protected]>:
> >
> >> Vladimir,
> >>
> >> Could you please elaborate as to why adding ByteBuffer to the Binary
> >> protocol is “not a good idea”? What is inherently wrong about it?
> >>
> >> As for your suggestion, while I certainly see your point, it comes with
> >> the inconvenience of implementing Binarylizable for every entity type
> in an
> >> application, which makes me wonder if the self imposed rules such as
> “type
> >> system should be kept as small as possible” could be relaxed in the
> name of
> >> greater good and user friendliness. Ignite API is not easy to grok as
> it is
> >> already.
> >>
> >> Thanks
> >> Andrey
> >> _____________________________
> >> From: Vladimir Ozerov <[email protected]<mailto:[email protected]
> >>
> >> Sent: Monday, November 6, 2017 10:22 PM
> >> Subject: Re: Deserialization of BinaryObject's byte arrays
> >> To: <[email protected]<mailto:[email protected]>>
> >> Cc: Valentin Kulichenko <[email protected]<mailto:
> >> [email protected]>>
> >>
> >>
> >> Hi Andrey,
> >>
> >> While we deifnitely need to support byte buffers, adding new type to
> >> protocol is not very good idea. Binary protocol is the heart of our
> >> ssytem,
> >> which is used not oly for plain Java, but for other platforms and SQL.
> And
> >> ByteBuffer is essentially the same byte array, but with different API.
> >> What
> >> you describe is a special case of byte array read/write. For this reason
> >> our type system should be kept as small as possible and it doesn't
> require
> >> new type to be introduced. Instead, we'd better to handle everything on
> >> API
> >> level only. For example, we can add new methods to
> reader/writer/builder,
> >> which will handle this. E.g.:
> >>
> >> BinaryWriter.writeByteArray(<some producer here>);
> >> BinaryReader.readByteArray(<some consumer here>);
> >>
> >> Vladimir.
> >>
> >> On Mon, Nov 6, 2017 at 9:25 PM, Andrey Kornev <[email protected]
> <
> >> mailto:[email protected] <[email protected]>>>
> >> wrote:
> >>
> >> > Will do! Thanks!
> >> > ________________________________
> >> > From: Valentin Kulichenko <[email protected]<mailto:
> >> [email protected]>>
> >> > Sent: Monday, November 6, 2017 9:17 AM
> >> > To: [email protected]<mailto:[email protected]>
> >> > Subject: Re: Deserialization of BinaryObject's byte arrays
> >> >
> >> > This use case was discussed couple of times already, so it seems to be
> >> > pretty important for users. And I like the approach suggested by
> Andrey.
> >> >
> >> > Andrey, can you create a Jira ticket for this change? Would you like
> to
> >> > contribute it?
> >> >
> >> > -Val
> >> >
> >> > On Fri, Nov 3, 2017 at 4:18 PM, Dmitriy Setrakyan <
> >> [email protected]<mailto:[email protected]>>
> >> > wrote:
> >> >
> >> > > This makes sense to me. Sounds like a useful feature.
> >> > >
> >> > > Would be nice to hear what others in the community think?
> >> > >
> >> > > D.
> >> > >
> >> > > On Fri, Nov 3, 2017 at 1:07 PM, Andrey Kornev <
> >> [email protected]<mailto:[email protected]>>
> >> > > wrote:
> >> > >
> >> > > > Hello,
> >> > > >
> >> > > > We store lots of byte arrays (serialized graph-like data
> >> structures) as
> >> > > > fields of BinaryObject. Later on, while reading such field,
> >> > > > BinaryInputStream implementation creates an on-heap array and
> copies
> >> > the
> >> > > > bytes from the BinaryObject's internal backing array to the new
> >> array.
> >> > > >
> >> > > > While in general case it works just fine, in our case, a lot of
> CPU
> >> is
> >> > > > spent on copying of the bytes and significant amount garbage is
> >> > generated
> >> > > > as well.
> >> > > >
> >> > > > I'd like Ignite Community to consider the following proposal:
> >> introduce
> >> > > > support for serialization/deserialization of the ByteBuffer
> class. A
> >> > > > BinaryInputStream implementation would special case ByteBuffer
> >> fields
> >> > > same
> >> > > > way it currently special cases various Java types (collections,
> >> dates,
> >> > > > timestamps, enums, etc.)
> >> > > >
> >> > > > The application developer would simply call
> >> > > BinaryObjectBuilder.setField(...)
> >> > > > passing in an instance of ByteBuffer. During serialization, the
> >> > > > ByteBuffer's bytes would simply be copied to the backing array
> >> (similar
> >> > > to
> >> > > > how regular byte arrays are serialized today) and in the binary
> >> > object's
> >> > > > metadata the field marked as, say, GridBinaryMarshaller.BYTE_
> >> BUFFER.
> >> > > >
> >> > > > During deserialization of the field, instead of allocating a new
> >> byte
> >> > > > array and transferring the bytes from the BinaryObject's backing
> >> array,
> >> > > > BinaryInputStream would simply wrap the required sub-array into a
> >> > > read-only
> >> > > > instance of ByteBuffer using "public static ByteBuffer wrap(byte[]
> >> > array,
> >> > > > int offset, int length)" and return the instance to the
> application.
> >> > > >
> >> > > > This approach doesn't require any array allocation/data copying
> and
> >> is
> >> > > > faster and significantly reduces pressure on the garbage
> collector.
> >> The
> >> > > > benefits are especially great when application stores significant
> >> > amount
> >> > > of
> >> > > > its data as byte arrays.
> >> > > >
> >> > > > Clearly, this approach equally applies to arrays of other
> primitive
> >> > > types.
> >> > > >
> >> > > > A minor complication does arise when dealing with off-heap
> >> > BinaryObjects,
> >> > > > but nothing that can't solved with a little bit of Java reflection
> >> > > wizardry
> >> > > > (or by other means).
> >> > > >
> >> > > > Thanks,
> >> > > > Andrey
> >> > > >
> >> > >
> >> >
> >>
> >>
> >>
>

Re: Deserialization of BinaryObject's byte arrays

Reply via email to