I think there is value in having a single string encoding. Sarge
> On 1 Dec, 2017, at 17:35, Jacob Barrett <jbarr...@pivotal.io> wrote: > > On Fri, Dec 1, 2017 at 4:59 PM Dan Smith <dsm...@pivotal.io> wrote: > >> I think I'm kinda with Mike on this one. The existing string format does >> seem pretty gnarly. But the complexity of implementing and testing all of >> the backwards compatibility transcoding that would be required in order to >> move to the new proposed format seems to be way more work with much more >> possibility for errors. Do we really expect people to be writing new >> clients that use DataSerializable? It hasn't happened yet, and we're >> working on a new protocol that uses protobuf right now. >> > > Consider that any new clients written would have to implement all these > encodings. This is going to make writing new clients using the upcoming new > protocol laborious. The new protocol does not define object encoding, it > strictly defines message encoding. Objects sent over the protocol will have > to be serialized in some format, like PDX or data serializer. We could > alway develop a better serialization format than what we have now. If we > don't develop something new then we have to use the old. Wouldn't it be > nice if the new clients didn't have to deal with legacy encodings? > > If the issue is really the complexity of serialization from the C++ client, >> maybe the C++ client could always write UTF-16 strings? >> > > You can't assume that a client in one language will only be serializing > strings for it's own consumption. We have many people using strings in PDX > to transform between C++, .NET and Java. > > The risk is high not to remove this debt. If I am developing a new Ruby > client I am forced to deal with all 4 of these encodings. Am I really going > to want to build a Ruby client for Geode, am I going to get these encodings > correct? I can tell you that getting them correct may be a challenge if the > current C++ client is any indication, it has a few incorrect assumptions in > its encoding of ASCII and modified UTF-8. > > I am fine with a compromise that deprecates but doesn't remove the old > encodings for a few releases. This would give time for users to update. New > clients written would not be be able to read this old data but could read > and write new data. > > > > -Jake