On 13 Apr 2009, at 23:39, Chris Anderson wrote:

On Mon, Apr 13, 2009 at 3:34 PM, dmi <losth...@yandex.ru> wrote:
Hello All!

CouchDB now using modified version of mochijson2 for JSON output.
The standard behavior of this library is to accept unicode in all forms (unicode, utf8, \uXXXX) via decode/1, but when unicode is emitted via encode/1 to the client app, all unicode symbols are converted to \uXXXX form.

This is done for maximal compatibility. But I suspect, that modern software, which may want to interact with CouchDB, will have no problems with raw UTF8.

Recent version of mochiweb (r99) introduces an optional capability for mochijson2 to emit raw utf8.
The proposed way is:

Encoder = mochijson2:encoder([{utf8, true}]),
JSON = Encoder(json())

I have tested this patch (in reduced form) against CouchDB and it seems to be working.

I think, that bringing this option to CouchDB will be a good improvement for developers of international software.


Thanks for digging in here.

To avoid incompatibility with old software, we may want to either:

- make this a request time option
- switch intelligently on some http request header

Any thoughts on how best to do this? Should utf8 be the default, or \uXXXX?

Once we have these questions answered, if you put a patch in JIRA[1]
it's likely to be accepted.

[1] http://issues.apache.org/jira/browse/COUCHDB


In my experience real unicode is better than \u (and shorter!). The json spec (http://json.org) specifically says that you *must* accept any unicode character other than the \ escaped ones, and I was very surprised to find that a lot of json tools produce the \u versions by default.

Because it is part of the spec I dont see any problem in just changing it.


--
Chris Anderson
http://jchrisa.net
http://couch.io

Reply via email to