Hey David,

Apache Thrift has a "string" type in its IDL and that type is a language
native string in the generated code but is UTF-8 on the wire when using
binary, compact or JSON protocols by default.

I think Jens is posing the question (correct me if I'm wrong Jens): Should
we also support UTF-16 string encoding on the wire with binary, compact and
JSON protocols.

-Randy

On Thu, Dec 31, 2015 at 5:09 PM, David Bennett <[email protected]> wrote:

> >>>while UTF-8 is great, especially on Windows platforms UTF-16 is more
> common, because the OS uses it heavily internally. Since Win2k it also
> supports surrogates and supplementary characters. So there’s OS support for
> it. What I don’t know is, how universally is UTF-16 (or a subset of it)
> supported across other platforms? Can we assume a certain degree of support
> on all the various platforms that Thrift can run on?
>
> >>>TL;DR: Would it make sense to add UTF-16 as another string format type?
>
> In my opinion, no. This is based on a mistaken understanding or
> expectation.
>
> Thrift currently supports a string of bytes as a type, and users who wish
> to exchange character string data are expected to impose some kind of
> meaning on top of that.
>
> What Thrift needs is a genuine string data type, independent of any
> particular transport format, and which fully supports Unicode code points.
> The transport mechanism could be UTF-8, UTF-16, UTF-32 or variable length
> (zigzag) integers (currently Unicode requires about 21 bits).
>
> User libraries would of course be free to reformat those Unicode strings
> into any format comfortably supported by the platform. On Windows UTF-16 is
> preferred, but should never be viewed as something different from the
> underlying Unicode string.
>
> Regards
> David M Bennett FACS
>
> Andl - A New Database Language - andl.org
>
>
>
>
>
>

Reply via email to