Vladimir, > When we finish varlen optimization for string lengths, I am afraid we could > end up with very messy protocol, should we mix encoded length and encoding. I agree, we shouldn't mix it.
> I deemed it's unusual to make two differerent type markers (flags) for > single datatype. I can't see the source right now Theoretically, you can combine GridBinaryMarshaller.STRING with BinaryWriteMode. I agree with Vladimir, way of addition of new type is the the most clear for me. > Encoding must be set on per field basis. This will give us as most flexible > solution at the cost of 1-byte overhead. > Vova, I agree that the encoding should be set on per-field basis, but at > the table level, not at a cell level. Dmitriy, Vladimir, Let's use both approaches :-) We can add parameter to CacheConfiguration. If parameter specifie to use cache level encoding then marshaller will use encoding in a cache, otherwise marshaller will use per-field encoding. Of course only if it doesn't complicate the solution. 2017-07-25 20:44 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>: > On Tue, Jul 25, 2017 at 12:36 PM, Vladimir Ozerov <voze...@gridgain.com> > wrote: > > > Vyacheslav, > > When we finish varlen optimization for string lengths, I am afraid we > could > > end up with very messy protocol, should we mix encoded length and > encoding. > > > > Dima, > > Encoding must be set on per field basis. This will give us as most > flexible > > solution at the cost of 1-byte overhead. > > > > Vova, I agree that the encoding should be set on per-field basis, but at > the table level, not at a cell level. I cannot foresee a situation where we > would have different encodings in the same column. If that ever happens, > then user can provide already encoded values. > > > > > > вт, 25 июля 2017 г. в 20:23, Dmitriy Setrakyan <dsetrak...@apache.org>: > > > > > I don't understand why this encoding is done on per-object and not on > > > per-cache level. Shouldn't the column-to-encoding mapping be defined at > > > cache level configuration? > > > > > > On Tue, Jul 25, 2017 at 12:13 PM, Vladimir Ozerov < > voze...@gridgain.com> > > > wrote: > > > > > > > Andrey, > > > > > > > > You cannot have optional part in the middle as it will break > > > compatibility > > > > in dangerous way, probably leading to node crash. Also having INT (4 > > > bytes) > > > > looks too much for me. > > > > > > > > Instead, I would add new type "encoded string": > > > > 1 byte - type > > > > 1 byte - encoding code, map frequently used encodings to some byte > > value; > > > > also have a special value, meaning that encoding will be written as > > > string > > > > afterwards, this way we will support any encoding out of the box > > > > [optional] encoding name > > > > 4 bytes - string length > > > > Finally - string bytes > > > > > > > > Vladimir. > > > > > > > > вт, 25 июля 2017 г. в 18:24, Andrey Kuznetsov <stku...@gmail.com>: > > > > > > > > > I apologize for damaged formatting. Below is my message as it > should > > > be. > > > > > > > > > > > > > > > Hi Igniters, > > > > > > > > > > I'd like to discuss future changes related to > > > https://issues.apache.org/ > > > > > jira/browse/IGNITE-5655 > > > > > <https://issues.apache.org/jira/browse/IGNITE-5655>. > > > > > > > > > > Is it really good idea to introduce new flag (ENCODED_STRING) for > > > > existing > > > > > String datatype? It's possible to use existing STRING flag at > > > negligible > > > > > performance cost. > > > > > > > > > > Currently, utf-8-encoded string looks like > > > > > > > > > > byteFlag nonNegativeIntStrLen bytes > > > > > > > > > > This format can be backward compatibly extended to > > > > > > > > > > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes > > > > > > > > > > Next, I suggest to add new BinaryConfiguration property for > encoding > > to > > > > use > > > > > instead of using global property. It seems to be more convenient > for > > > > user. > > > > > > > > > > I'll appreciate your feedback. > > > > > > > > > > 2017-07-25 16:13 GMT+03:00 Andrey Kuznetsov <stku...@gmail.com>: > > > > > > > > > > > Hi Igniters,I'd like to discuss future changes related to > > > IGNITE-5655 > > > > > > <https://issues.apache.org/jira/browse/IGNITE-5655> . Is it > > really > > > > good > > > > > > idea to introduce new flag (ENCODED_STRING) for existing String > > > > datatype? > > > > > > It's possible to use existing STRING flag at negligible > performance > > > > cost. > > > > > > Currently, utf-8-encoded string looks like > > > > > > byteFlag nonNegativeIntStrLen bytes > > > > > > This format can be backward compatibly extended to > > > > > > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes > > > > > > Next, I suggest to add new BinaryConfiguration property for > > encoding > > > to > > > > > use > > > > > > instead of using global property. It seems to be more convenient > > for > > > > > > user.I'll appreciate your feedback. > > > > > > > > > > > > > > > > > > > > > > > > ----- > > > > > > Best regards, > > > > > > Andrey Kuznetsov. > > > > > > -- > > > > > > View this message in context: http://apache-ignite- > > > > > > developers.2346864.n4.nabble.com/Non-UTF-8-string-encoding- > > > > > > support-in-BinaryMarshaller-IGNITE-5655-tp20024.html > > > > > > Sent from the Apache Ignite Developers mailing list archive at > > > > > Nabble.com. > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Best regards, > > > > > Andrey Kuznetsov. > > > > > > > > > > > > > > > -- Best Regards, Vyacheslav D.