Thrift is utf-8 everywhere. The string doc is here: https://thrift.apache.org/docs/types
On Wed, Oct 15, 2014 at 6:53 AM, Peter Neumark <[email protected]> wrote: > Does thrift officially say anything about the character encoding of string > fields? > > On Tue, Oct 14, 2014 at 9:48 PM, Jens Geyer <[email protected]> wrote: > >> Hi Peter, >> >> The thrift wire format has support for unicode fields: >>> >> >> I just scanned the code base. >> - In some cases the TType numbers go up to 17, including utf8 and 16. >> - Other languages do only define what is actually used, up to 15 >> >> I don't know what the intention is/was behind these two additional values. >> Maybe someone else can chime in here. >> >> Have fun, >> JensG >> >> >> >> -----Ursprüngliche Nachricht----- From: Peter Neumark >> Sent: Tuesday, October 14, 2014 9:36 AM >> To: [email protected] >> Subject: Re: unicode types >> >> I'd prefer not to use "string" types in my thrift files, since that doesn't >> say anything about the encoding. >> Instead, I'd like the following: >> >> struct JpegData { >> 1: optional binary exif, >> 2: *utf8* filename, >> >> 3: i32 bytes >> } >> >> The thrift wire format has support for unicode fields: >> https://github.com/apache/thrift/blob/master/lib/py/src/Thrift.py#L39 >> >> But the IDL doesn't let me use them directly for some reason. >> My question is, are there plans to support the example thrift struct >> definition above? >> >> Thanks, >> Peter >> >> On Tue, Oct 14, 2014 at 9:13 AM, Jens Geyer <[email protected]> wrote: >> >> Hi Peter, >>> >>> They need to be interoperable between all platforms an lggs. Somewhere in >>> the docs UTF8 is mentioned, IIRC. Is that what you ask for? >>> >>> ________________________________ >>> Von: Peter Neumark >>> Gesendet: 13.10.2014 22:59 >>> An: [email protected] >>> Betreff: unicode types >>> >>> Hi all, >>> >>> Looking at the wire format's type IDs, it's clear that thrift supports >>> several thrift encodings in it's wire format, yet the IDL does not allow >>> one to speak of string encoding (string/binary are the only type names in >>> the IDL). >>> >>> Is this a design decision (where each language implementation can choose >>> the appropriate Unicode type id for encoding strings), or is there some >>> historical reason for not exposing string encoding options in the thrift >>> IDL? >>> >>> Thanks, >>> Peter >>> >>> -- >>> >>> *Peter Neumark* >>> DevOps guy @Prezi <http://prezi.com> >>> >>> >> >> >> -- >> >> *Peter Neumark* >> DevOps guy @Prezi <http://prezi.com> >> > > > > -- > > *Peter Neumark* > DevOps guy @Prezi <http://prezi.com>
