Hey David, Apache Thrift has a "string" type in its IDL and that type is a language native string in the generated code but is UTF-8 on the wire when using binary, compact or JSON protocols by default.
I think Jens is posing the question (correct me if I'm wrong Jens): Should we also support UTF-16 string encoding on the wire with binary, compact and JSON protocols. -Randy On Thu, Dec 31, 2015 at 5:09 PM, David Bennett <[email protected]> wrote: > >>>while UTF-8 is great, especially on Windows platforms UTF-16 is more > common, because the OS uses it heavily internally. Since Win2k it also > supports surrogates and supplementary characters. So there’s OS support for > it. What I don’t know is, how universally is UTF-16 (or a subset of it) > supported across other platforms? Can we assume a certain degree of support > on all the various platforms that Thrift can run on? > > >>>TL;DR: Would it make sense to add UTF-16 as another string format type? > > In my opinion, no. This is based on a mistaken understanding or > expectation. > > Thrift currently supports a string of bytes as a type, and users who wish > to exchange character string data are expected to impose some kind of > meaning on top of that. > > What Thrift needs is a genuine string data type, independent of any > particular transport format, and which fully supports Unicode code points. > The transport mechanism could be UTF-8, UTF-16, UTF-32 or variable length > (zigzag) integers (currently Unicode requires about 21 bits). > > User libraries would of course be free to reformat those Unicode strings > into any format comfortably supported by the platform. On Windows UTF-16 is > preferred, but should never be viewed as something different from the > underlying Unicode string. > > Regards > David M Bennett FACS > > Andl - A New Database Language - andl.org > > > > > >
