I would agree with Andrey, it does look a bit over-architected to me. Why would anyone try to move data from one encoding to another? Is it a real use case that needs to be handled automatically?
Here is what I think we should handle: 1. Ability to set cluster-wide encoding. This should be easy. 2. Ability to set per-column encoding. Such encoding should be set on per-column level, perhaps at cache creation or table creation. For example, at the cache creation time, we could let user define all column names that will have non-default encodings. Thoughts? D. On Wed, Sep 6, 2017 at 6:27 AM, Andrey Kuznetsov <[email protected]> wrote: > As of option #1, it's not so bad. Currently we've implemented global level > encoding switch, and this looks similar to DBMS: if server works with > certain encoding, then all clients should be configured to use the same > encoding for correct string processing. > > Option #2 provokes a number of questions. > > What are performance implications of such hidden binary reencoding? > > Who will check for possible data loss on transparent reencoding (when > object walks between caches/fields with distinct encodings)? > > How should we handle nested binary objects? On the one hand, they should be > reencoded in a way described by Vladimir. On the other hand, BinaryObject > is an independent entity, that can be serialized/deserialized freely, moved > between various data structures, etc. It will be frustrating for user to > find its binary state changed after storing in a grid, with possible data > corruption. > > > As far as I can see, we are trying to couple orthogonal APIs: > BinaryMarshaller, IgniteCache and SQL. BinaryMarshaller is > Java-datatype-driven, it creates 1-to-1 mapping between Java types and > their binary representations, and now we are trying to map two binary types > (STRING and ENCODED_STRING) to single String class. IgniteCache is much > more flexible API, than SQL, but it lacks encoded string datatype, that > exists in SQLs of some RDBMSs: `varchar(n) character set some_charset`. > It's not a popular idea, but many problems could be solved by adding such > type. Those IgniteCache API users who don't need it won't use it, but it > could become a bridge between SQL and BinaryMarshaller encoded-string > types. > > 2017-09-06 10:32 GMT+03:00 Vladimir Ozerov <[email protected]>: > > > What we tried to achieve is that several encoding could co-exist in a > > single cluster or even single cache. This would be great from UX > > perspective. However, from what Andrey wrote, I understand that this > would > > be pretty hard to achieve as we rely heavily on similar binary > > representation of objects being compared. That said, while this could > work > > for SQL with some adjustments, we will have severe problems with > > BinaryObject.equals(). > > > > Let's think on how we can resolve this. I see two options: > > 1) Allow only single encoding in the whole cluster. Easy to implement, > but > > very bad from usability perspective. Especially this would affect > clients - > > client nodes, and what is worse, drivers and thin clients! They all would > > have to bother about which encoding to use. But may be we can share this > > information during handshake (as every client has a handshake). > > > > 2) Add custom eocnding flag/ID to object header if non-standard enconding > > appears somewhere inside the object (even in nested objects). This way, > we > > will be able to re-create the object if needed if expected and actual > > encoding doesn't match. For example, consider we have two caches/tables > > with different encoding (not implemented in current iteration, but we may > > decide to implement per-cache encodings in future, as this any RDBMS > > support it). And then I decide to move object A from cache 1 with UTF-8 > > encoding to cache 2 with Cp1251 encoding. In this case I will detect > > encoding mismatch through object header (or footer) and re-build it > > transparently for user. > > > > Second option is more preferable to me as a long-term solution, but would > > require =more efforts. > > > > Thoughts? > > > > -- > Best regards, > Andrey Kuznetsov. >
