Re: unicode output representation

Justin Cormack Wed, 15 Apr 2009 04:44:34 -0700


On 13 Apr 2009, at 23:39, Chris Anderson wrote:

On Mon, Apr 13, 2009 at 3:34 PM, dmi <losth...@yandex.ru> wrote:
Hello All!

CouchDB now using modified version of mochijson2 for JSON output.
The standard behavior of this library is to accept unicode in allforms (unicode, utf8, \uXXXX) via decode/1,but when unicode is emitted via encode/1 to the client app, allunicode symbols are converted to \uXXXX form.
This is done for maximal compatibility. But I suspect, that modernsoftware, which may want to interact with CouchDB, will have noproblems with raw UTF8.
Recent version of mochiweb (r99) introduces an optional capabilityfor mochijson2 to emit raw utf8.
The proposed way is:

Encoder = mochijson2:encoder([{utf8, true}]),
JSON = Encoder(json())
I have tested this patch (in reduced form) against CouchDB and itseems to be working.
I think, that bringing this option to CouchDB will be a goodimprovement for developers of international software.
Thanks for digging in here.

To avoid incompatibility with old software, we may want to either:

- make this a request time option
- switch intelligently on some http request header
Any thoughts on how best to do this? Should utf8 be the default, or\uXXXX?
Once we have these questions answered, if you put a patch in JIRA[1]
it's likely to be accepted.

[1] http://issues.apache.org/jira/browse/COUCHDB

In my experience real unicode is better than \u (and shorter!). Thejson spec (http://json.org) specifically says that you *must* acceptany unicode character other than the \ escaped ones, and I was verysurprised to find that a lot of json tools produce the \u versions bydefault.

Because it is part of the spec I dont see any problem in just changingit.


--
Chris Anderson
http://jchrisa.net
http://couch.io

Re: unicode output representation

Reply via email to