CouchDB accepts data which it cannot replicate (invalid UTF-8 json during
replication)
--------------------------------------------------------------------------------------
Key: COUCHDB-1176
URL: https://issues.apache.org/jira/browse/COUCHDB-1176
Project: CouchDB
Issue Type: Bug
Affects Versions: 1.0.2, 1.0.1
Environment: CentOS 5.5 64bit
Reporter: Jaakko Sipari
Priority: Critical
Attachments: fffe_escaped.json, fffe_utf8.json
CouchDB appears to treat some unicode characters as illegal when parsing
escaped unicode values (\uXXXX) during insert or update of a document. These
characters can however be inserted to the database by using the UTF-8 encoding
instead of escaping. An example value would be an unicode value 0xFFFE which is
escaped \uFFFE and as UTF-8 is represented by consecutive bytes with values
0xEF 0xBF and 0xBE.
Even though the documents are inserted using UTF-8 encoding without errors,
couchdb always serves them in the escaped form. This leads us to the actual
problem we currently have. If documents containing such unaccepted characters
are inserted to couchdb by using UTF-8 encoding, attempt to replicate the
database will abort to first of those documents giving an error like this:
{"error":"json_encode","reason":"{bad_term,{nocatch,{invalid_json,<<\"[{\\\"ok\\\":{\\\"_id\\\":\\\"192058c4f81afc66c5bf883548004331\\\",\\\"_rev\\\":\\\"1-ad1c9dcee520d12abdf948d91e31cf15\\\",\\\"abc\\\":\\\"\\\\ufffe\\\",\\\"_revisions\\\":{\\\"start\\\":1,\\\"ids\\\":[\\\"ad1c9dcee520d12abdf948d91e31cf15\\\"]}}}]\\n\">>}}}"}
Here are steps to reproduce:
curl -X PUT http://localhost:5984/replicationtest_source
curl -X PUT http://localhost:5984/replicationtest_target
# Should fail
curl -H "Content-Type:application/json" -X POST -d @fffe_escaped.json
http://localhost:5984/replicationtest_source
# Should succeed
curl -H "Content-Type:application/json" -X POST -d @fffe_utf8.json
http://localhost:5984/replicationtest_source
# Should fail to json_encode error related to the previously inserted document
curl -H "Content-Type:application/json" -X POST -d
"{\"source\":\"http://localhost:5984/replicationtest_source\",\"target\":\"replicationtest_target\"}"
http://localhost:5984/_replicate
If anyone has a quick fix for this (how to accept "invalid" escaped unicode
characters at least during replication), we would be more than happy to test it.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira