[ 
https://issues.apache.org/jira/browse/COUCHDB-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038088#comment-13038088
 ] 

Pasi Eronen commented on COUCHDB-1176:
--------------------------------------

Tested also with branches/1.0.x and branches/1.1.x (as of today), with same 
result.

> CouchDB accepts data which it cannot replicate (invalid UTF-8 json during 
> replication)
> --------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-1176
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1176
>             Project: CouchDB
>          Issue Type: Bug
>    Affects Versions: 1.0.1, 1.0.2
>         Environment: CentOS 5.5 64bit
>            Reporter: Jaakko Sipari
>            Priority: Critical
>         Attachments: fffe_escaped.json, fffe_utf8.json
>
>
> CouchDB appears to treat some unicode characters as illegal when parsing 
> escaped unicode values (\uXXXX) during insert or update of a document.  These 
> characters can however be inserted to the database by using the UTF-8 
> encoding instead of escaping. An example value would be an unicode value 
> 0xFFFE which is escaped \uFFFE and as UTF-8 is represented by consecutive 
> bytes with values 0xEF 0xBF and 0xBE.
> Even though the documents are inserted using UTF-8 encoding without errors, 
> couchdb always serves them in the escaped form. This leads us to the actual 
> problem we currently have. If documents containing such unaccepted characters 
> are inserted to couchdb by using UTF-8 encoding, attempt to replicate the 
> database will abort to first of those documents giving an error like this:
> {"error":"json_encode","reason":"{bad_term,{nocatch,{invalid_json,<<\"[{\\\"ok\\\":{\\\"_id\\\":\\\"192058c4f81afc66c5bf883548004331\\\",\\\"_rev\\\":\\\"1-ad1c9dcee520d12abdf948d91e31cf15\\\",\\\"abc\\\":\\\"\\\\ufffe\\\",\\\"_revisions\\\":{\\\"start\\\":1,\\\"ids\\\":[\\\"ad1c9dcee520d12abdf948d91e31cf15\\\"]}}}]\\n\">>}}}"}
> Here are steps to reproduce:
> curl -X PUT http://localhost:5984/replicationtest_source
> curl -X PUT http://localhost:5984/replicationtest_target
> # Should fail
> curl -H "Content-Type:application/json" -X POST -d @fffe_escaped.json 
> http://localhost:5984/replicationtest_source
> # Should succeed
> curl -H "Content-Type:application/json" -X POST -d @fffe_utf8.json 
> http://localhost:5984/replicationtest_source
> # Should fail to json_encode error related to the previously inserted document
> curl -H "Content-Type:application/json" -X POST -d 
> "{\"source\":\"http://localhost:5984/replicationtest_source\",\"target\":\"replicationtest_target\"}";
>  http://localhost:5984/_replicate
> If anyone has a quick fix for this (how to accept "invalid" escaped unicode 
> characters at least during replication), we would be more than happy to test 
> it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to