On 13 Apr 2009, at 23:39, Chris Anderson wrote:
On Mon, Apr 13, 2009 at 3:34 PM, dmi <losth...@yandex.ru> wrote:
Hello All!
CouchDB now using modified version of mochijson2 for JSON output.
The standard behavior of this library is to accept unicode in all
forms (unicode, utf8, \uXXXX) via decode/1,
but when unicode is emitted via encode/1 to the client app, all
unicode symbols are converted to \uXXXX form.
This is done for maximal compatibility. But I suspect, that modern
software, which may want to interact with CouchDB, will have no
problems with raw UTF8.
Recent version of mochiweb (r99) introduces an optional capability
for mochijson2 to emit raw utf8.
The proposed way is:
Encoder = mochijson2:encoder([{utf8, true}]),
JSON = Encoder(json())
I have tested this patch (in reduced form) against CouchDB and it
seems to be working.
I think, that bringing this option to CouchDB will be a good
improvement for developers of international software.
Thanks for digging in here.
To avoid incompatibility with old software, we may want to either:
- make this a request time option
- switch intelligently on some http request header
Any thoughts on how best to do this? Should utf8 be the default, or
\uXXXX?
Once we have these questions answered, if you put a patch in JIRA[1]
it's likely to be accepted.
[1] http://issues.apache.org/jira/browse/COUCHDB
In my experience real unicode is better than \u (and shorter!). The
json spec (http://json.org) specifically says that you *must* accept
any unicode character other than the \ escaped ones, and I was very
surprised to find that a lot of json tools produce the \u versions by
default.
Because it is part of the spec I dont see any problem in just changing
it.
--
Chris Anderson
http://jchrisa.net
http://couch.io