Andrey, Special collection types support was our mistake when we designed binary protocol in the first place.
As far as BinaryObject idea - makes sense. But I would avoid any kind of implicit conversions. Special method for byte array should be enough. вт, 7 нояб. 2017 г. в 22:01, Andrey Kornev <[email protected]>: > Vladimir, > > I appreciate your taking the time to reply. Thank you for sharing your > thoughts! > > I have to admit I'm still not convinced there is anything inherently wrong > about having the Binary protocol support ByteBuffers as first class > citizen. The Binary marshaller already supports specific collection/map > types (such as linked list, array list, hash set etc.) which all, one > could argue, are just "thin wrappers" around "fundamental" sequence/map > abstractions. I don't see why ByteBuffer can't be treated the same way. > > Also, your approach, given a relatively simple problem we're trying to > solve here, feels very much over-engineered. As an Ignite user, I'd like to > have a *simple* way of deserializing primitive arrays efficiently. Having > to implement a bunch of interfaces to do just that doesn't fit my (Ignite > user's) definition of "simple". Contrary to your statement below, the APIs > you define (or not define) do have a lot to do with user experience. > > In any case, here's a new proposal. And this time, it doesn't require > modification of the Binary protocol. > > Essentially what we need is just a way to tell the Binary marshaller > how array fields are to be deserialized. It can be done globally (for > example, by introducing a new config property on BinaryConfiguration). Or, > it can be done on per binary object instance basis. For example, we can add > a new method to the BinaryObject class (e.g. BinaryObject.withArrayAsBuffer()) > to make subsequent calls to BinaryObject.field() return an instance of > XxxBuffer class rather than a raw array (if the underlying field is of > course an array). > > (Alternatively, we could enhance the BinaryObject class by adding: > > Buffer fieldAsBuffer(String fieldName); > > method in addition to the existing: > > T field(String fieldName); > > This would make it possible to read any field (including scalar > primitive types and String's) as an instance of XxxBuffer that wraps the > field's raw bytes in the BinaryObject's backing array.) > > Actually I personally like this alternative better -- it's more consistent > and symmetric, and opens up opportunities for further optimization. For > example, in some cases (such as equality checks) I don't need an > actual instance of a String -- its character (or even byte) arrays are > sufficient. > > The good thing is that neither approach requires the field to have > been originally written (serialized) as XxxBuffer. It could've been just a > regular array. The actual type (e.g. byte[] or ByteBuffer) only matters > during deserialization. > > Regards > Andrey > > ------------------------------ > *From:* Vladimir Ozerov <[email protected]> > *Sent:* Tuesday, November 7, 2017 12:25 AM > *To:* [email protected] > > *Subject:* Re: Deserialization of BinaryObject's byte arrays > Regarding "sefl-imposed rules": > 1) Binary protocol is essentially ignite's type system. It is used in many > places: Java, .NET, CPP, SQL (native, JDBC, ODBC), thin client. Of those 5 > parties only Java has "ByteBuffer" concept. How other platforms should work > with this type? > 2) We compare objects using binary representation > (BinaryArrayIdentityResolver#equals0). So if one user write data as byte > array, and another write the same data as ByteBuffer, resulting objects > will not be equal > These are clear examples on why type system should be as small as possible > - we simply cannot afford extending type system for every minor use case. > Binary protocol is internal thing and has nothing to do with user > experience. > > What is worse, ByteBuffer is terrible abstraction as recognized by many > millions of Java users - it is stateful. One cannot read data from it > without changing it's state. And Ignite gives no guarantees that particular > object will be serialized only once per certain operation. One example is > BinaryMetadataCollector - Ignite could go through object twice per > serialization cycle to collect metadata. How are we going to maintain > idempotence in general case? > > What could improve UX is public interfaces. Raw example of what we can do: > 1) Define generic wrappers over our streams that will give users ability to > read/write byte arrays as they want. E.g.: > interface BinaryArrayWriter { > write(byte[] src, int offset, int len); > write(ByteBuffer src, ...); > } > > interface BinaryArrayReader { > int readLength(); > byte[] read(); > void read(byte[] dest, int off, int len); > void read(ByteBuffer dest); > } > > 2) Then add new methods to reader/writer/builder. E.g.: > BinaryWriter.writeByteArray(String name, BiConsumer<Object, > BinaryArrayWriter> consumer); // Consumes original field and array writer; > BinaryReader.readByteArray(String name, Function<BinaryArrayReader> > consumer); // Takes array reader, produces original field > > 3) Last, introduce field annotation for user convenience: > class MyClass { > @BinaryArray(writerClass = MyWriterBiConsumer.class, readerClass = > MyReaderFunction.class) > private ByteBuffer buf; > } > > Now all user need to do is to implement write consumer and read function. > > Vladimir. > > > On Tue, Nov 7, 2017 at 10:16 AM, Vladimir Ozerov <[email protected]> > wrote: > > > ByteBuffer is not fundamental data type. It is a thin wrapper over array. > > Protocol should contain only fundamental types. > > > > вт, 7 нояб. 2017 г. в 10:05, Andrey Kornev <[email protected]>: > > > >> Vladimir, > >> > >> Could you please elaborate as to why adding ByteBuffer to the Binary > >> protocol is “not a good idea”? What is inherently wrong about it? > >> > >> As for your suggestion, while I certainly see your point, it comes with > >> the inconvenience of implementing Binarylizable for every entity type > in an > >> application, which makes me wonder if the self imposed rules such as > “type > >> system should be kept as small as possible” could be relaxed in the > name of > >> greater good and user friendliness. Ignite API is not easy to grok as > it is > >> already. > >> > >> Thanks > >> Andrey > >> _____________________________ > >> From: Vladimir Ozerov <[email protected]<mailto:[email protected] > >> > >> Sent: Monday, November 6, 2017 10:22 PM > >> Subject: Re: Deserialization of BinaryObject's byte arrays > >> To: <[email protected]<mailto:[email protected]>> > >> Cc: Valentin Kulichenko <[email protected]<mailto: > >> [email protected]>> > >> > >> > >> Hi Andrey, > >> > >> While we deifnitely need to support byte buffers, adding new type to > >> protocol is not very good idea. Binary protocol is the heart of our > >> ssytem, > >> which is used not oly for plain Java, but for other platforms and SQL. > And > >> ByteBuffer is essentially the same byte array, but with different API. > >> What > >> you describe is a special case of byte array read/write. For this reason > >> our type system should be kept as small as possible and it doesn't > require > >> new type to be introduced. Instead, we'd better to handle everything on > >> API > >> level only. For example, we can add new methods to > reader/writer/builder, > >> which will handle this. E.g.: > >> > >> BinaryWriter.writeByteArray(<some producer here>); > >> BinaryReader.readByteArray(<some consumer here>); > >> > >> Vladimir. > >> > >> On Mon, Nov 6, 2017 at 9:25 PM, Andrey Kornev <[email protected] > < > >> mailto:[email protected] <[email protected]>>> > >> wrote: > >> > >> > Will do! Thanks! > >> > ________________________________ > >> > From: Valentin Kulichenko <[email protected]<mailto: > >> [email protected]>> > >> > Sent: Monday, November 6, 2017 9:17 AM > >> > To: [email protected]<mailto:[email protected]> > >> > Subject: Re: Deserialization of BinaryObject's byte arrays > >> > > >> > This use case was discussed couple of times already, so it seems to be > >> > pretty important for users. And I like the approach suggested by > Andrey. > >> > > >> > Andrey, can you create a Jira ticket for this change? Would you like > to > >> > contribute it? > >> > > >> > -Val > >> > > >> > On Fri, Nov 3, 2017 at 4:18 PM, Dmitriy Setrakyan < > >> [email protected]<mailto:[email protected]>> > >> > wrote: > >> > > >> > > This makes sense to me. Sounds like a useful feature. > >> > > > >> > > Would be nice to hear what others in the community think? > >> > > > >> > > D. > >> > > > >> > > On Fri, Nov 3, 2017 at 1:07 PM, Andrey Kornev < > >> [email protected]<mailto:[email protected]>> > >> > > wrote: > >> > > > >> > > > Hello, > >> > > > > >> > > > We store lots of byte arrays (serialized graph-like data > >> structures) as > >> > > > fields of BinaryObject. Later on, while reading such field, > >> > > > BinaryInputStream implementation creates an on-heap array and > copies > >> > the > >> > > > bytes from the BinaryObject's internal backing array to the new > >> array. > >> > > > > >> > > > While in general case it works just fine, in our case, a lot of > CPU > >> is > >> > > > spent on copying of the bytes and significant amount garbage is > >> > generated > >> > > > as well. > >> > > > > >> > > > I'd like Ignite Community to consider the following proposal: > >> introduce > >> > > > support for serialization/deserialization of the ByteBuffer > class. A > >> > > > BinaryInputStream implementation would special case ByteBuffer > >> fields > >> > > same > >> > > > way it currently special cases various Java types (collections, > >> dates, > >> > > > timestamps, enums, etc.) > >> > > > > >> > > > The application developer would simply call > >> > > BinaryObjectBuilder.setField(...) > >> > > > passing in an instance of ByteBuffer. During serialization, the > >> > > > ByteBuffer's bytes would simply be copied to the backing array > >> (similar > >> > > to > >> > > > how regular byte arrays are serialized today) and in the binary > >> > object's > >> > > > metadata the field marked as, say, GridBinaryMarshaller.BYTE_ > >> BUFFER. > >> > > > > >> > > > During deserialization of the field, instead of allocating a new > >> byte > >> > > > array and transferring the bytes from the BinaryObject's backing > >> array, > >> > > > BinaryInputStream would simply wrap the required sub-array into a > >> > > read-only > >> > > > instance of ByteBuffer using "public static ByteBuffer wrap(byte[] > >> > array, > >> > > > int offset, int length)" and return the instance to the > application. > >> > > > > >> > > > This approach doesn't require any array allocation/data copying > and > >> is > >> > > > faster and significantly reduces pressure on the garbage > collector. > >> The > >> > > > benefits are especially great when application stores significant > >> > amount > >> > > of > >> > > > its data as byte arrays. > >> > > > > >> > > > Clearly, this approach equally applies to arrays of other > primitive > >> > > types. > >> > > > > >> > > > A minor complication does arise when dealing with off-heap > >> > BinaryObjects, > >> > > > but nothing that can't solved with a little bit of Java reflection > >> > > wizardry > >> > > > (or by other means). > >> > > > > >> > > > Thanks, > >> > > > Andrey > >> > > > > >> > > > >> > > >> > >> > >> >
