3. Encoding
JSON text SHALL be encoded in Unicode. The default encoding is
UTF-8.
Since the first two characters of a JSON text will always be ASCII
characters [RFC0020], it is possible to determine whether an octet
stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
at the pattern of nulls in the first four octets.
00 00 00 xx UTF-32BE
00 xx 00 xx UTF-16BE
xx 00 00 00 UTF-32LE
xx 00 xx 00 UTF-16LE
xx xx xx xx UTF-8
:-)
I think we can safely assume it is UTF-8, otherwise we must do the same shit
like XML parsers with mark() on BufferedInputStream.... Most libraries out
there can only read UTF-8 and SOLR itself produces only UTF8 JSON, right? Those
tests only check response from solr.
Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of
> Dawid Weiss
> Sent: Thursday, July 05, 2012 5:35 PM
> To: [email protected]
> Subject: Re: Question about solr config files encoding.
>
> > But JSON is defined to be UTF-8, so we must supply the encoding
> (IOUtils.UTF8_CHARSET).
>
> That RFC says it can be any unicode... this said I agree with you that we can
> probably assume it's UTF-8 and not worry about anything else.
>
> Dawid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] For additional
> commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]