[ 
https://issues.apache.org/jira/browse/COUCHDB-345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Curt Arnold updated COUCHDB-345:
--------------------------------

    Attachment: enctest.zip

This is a JUnit 4 test case (with corresponding pom.xml) that demonstrates the 
current broken behavior (or at least of about a week or so ago).

Documents that are not valid UTF-8 are accepted into the database, but can not 
be retrieved.  I did not test if they broken queries, but have no reason to 
doubt that misencoded documents would cause unexpected behavior in the 
database.  It would seem plausible that an attacker could seriously damage a 
CouchDB application by inserting misencoded documents.  Depending on an API 
layer to not send misencoded documents would still leave the DB vulnerable to 
an intentional attack or a miscoded API layer.

The test creates http://localhost:5984/testdb and then tries to insert 5 
documents.  The first is just straight ASCII, the second inserts a document 
containing \u00C0 - \u00C6 encoded in UTF-8 and the 3rd inserts the same 
document, but with the characters escaped instead of UTF-8 encoded.  These 
three behavior as expected.

The next two attempt to insert the same characters, but instead of UTF-8 
encoded, they are ISO-8859-1 encoded (that is the byte sequence 0xC0, 0xC1, 
0xC2 ... is in the body).  One attempt is with an Content-Encoding=ISO-8559-1 
and the other without.   Both PUT returns with an 201 response, but an attempt 
to fetch results in a 500 due with an encoding error stack trace.  Returning a 
400 in both cases would be appropriate since the RFC says that JSON is always 
UTF-8 encoded.



> "High ASCII" can be inserted into db but not retrieved
> ------------------------------------------------------
>
>                 Key: COUCHDB-345
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-345
>             Project: CouchDB
>          Issue Type: Bug
>    Affects Versions: 0.9
>         Environment: OSX 10.5.6
>            Reporter: Joan Touzet
>         Attachments: badtext.tar.gz, enctest.zip
>
>
> It is possible to PUT/POST a document into CouchDB with a "high ASCII" value 
> that cannot be retrieved. This results from not escaping a non-ASCII value 
> into \u#### when PUT/POSTing the document.
> The attached sample code will recreate the problem using the hex value D8 (Ø) 
> in a possibly unsavoury test string.
> Sample output against 0.9.0 is as follows:
> ================================================
> {
>     "ok": true
> }
> {
>     "id": "fail", 
>     "ok": true, 
>     "rev": "1-76726372"
> }
> {
>     "error": "ucs", 
>     "reason": "{bad_utf8_character_code}"
> }
> ================================================
> Please note this defect turned up another problem, namely that the 
> bad_utf8_character_code exception thrown by a design document attempting to 
> map() the bad document caused Futon to fail silently in building the view, 
> with no indication (except via debug log) that there was a failure. The log 
> indicated two attempts to build the view, both failing, followed by an 
> uncaught exception error for Futon.
> Based on this, there are likely other areas in the codebase that do not 
> handle the bad_utf8_character_code exception correctly.
> My belief is that CouchDB shouldn't accept this input and should have 
> rejected the PUT/POST, or should have escaped the input itself before the 
> insertion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to