[
https://issues.apache.org/jira/browse/COUCHDB-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jaakko Sipari updated COUCHDB-1176:
-----------------------------------
Attachment: fffe_utf8.json
fffe_escaped.json
Here are the files to be used with the curl commands.
> CouchDB accepts data which it cannot replicate (invalid UTF-8 json during
> replication)
> --------------------------------------------------------------------------------------
>
> Key: COUCHDB-1176
> URL: https://issues.apache.org/jira/browse/COUCHDB-1176
> Project: CouchDB
> Issue Type: Bug
> Affects Versions: 1.0.1, 1.0.2
> Environment: CentOS 5.5 64bit
> Reporter: Jaakko Sipari
> Priority: Critical
> Attachments: fffe_escaped.json, fffe_utf8.json
>
>
> CouchDB appears to treat some unicode characters as illegal when parsing
> escaped unicode values (\uXXXX) during insert or update of a document. These
> characters can however be inserted to the database by using the UTF-8
> encoding instead of escaping. An example value would be an unicode value
> 0xFFFE which is escaped \uFFFE and as UTF-8 is represented by consecutive
> bytes with values 0xEF 0xBF and 0xBE.
> Even though the documents are inserted using UTF-8 encoding without errors,
> couchdb always serves them in the escaped form. This leads us to the actual
> problem we currently have. If documents containing such unaccepted characters
> are inserted to couchdb by using UTF-8 encoding, attempt to replicate the
> database will abort to first of those documents giving an error like this:
> {"error":"json_encode","reason":"{bad_term,{nocatch,{invalid_json,<<\"[{\\\"ok\\\":{\\\"_id\\\":\\\"192058c4f81afc66c5bf883548004331\\\",\\\"_rev\\\":\\\"1-ad1c9dcee520d12abdf948d91e31cf15\\\",\\\"abc\\\":\\\"\\\\ufffe\\\",\\\"_revisions\\\":{\\\"start\\\":1,\\\"ids\\\":[\\\"ad1c9dcee520d12abdf948d91e31cf15\\\"]}}}]\\n\">>}}}"}
> Here are steps to reproduce:
> curl -X PUT http://localhost:5984/replicationtest_source
> curl -X PUT http://localhost:5984/replicationtest_target
> # Should fail
> curl -H "Content-Type:application/json" -X POST -d @fffe_escaped.json
> http://localhost:5984/replicationtest_source
> # Should succeed
> curl -H "Content-Type:application/json" -X POST -d @fffe_utf8.json
> http://localhost:5984/replicationtest_source
> # Should fail to json_encode error related to the previously inserted document
> curl -H "Content-Type:application/json" -X POST -d
> "{\"source\":\"http://localhost:5984/replicationtest_source\",\"target\":\"replicationtest_target\"}"
> http://localhost:5984/_replicate
> If anyone has a quick fix for this (how to accept "invalid" escaped unicode
> characters at least during replication), we would be more than happy to test
> it.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira