[
https://issues.apache.org/jira/browse/COUCHDB-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749207#action_12749207
]
Adam Kocoloski commented on COUCHDB-345:
----------------------------------------
Hi Curt, thanks for all your work on this ticket. I did a bit of simple
benchmarking using code from the eep0018 project, which indicated that
couch_db:json_decode increased decoding time by about 12% over trunk. The
patch I submitted added about 4%. I wouldn't put much stock in the difference
between these numbers, but it does support your expectation that
xmerl_ucs:from_utf8 is fast compared to the actual JSON decoding.
I'm comfortable with either patch. I'm not so concerned about diverging from
stock MochiWeb. We already diverge in order to correctly handle escaped UTF-16
surrogate pairs, and to use the generally-agreed-upon ejson term format
instead of the {obj, Data} one in the stock distribution. We submitted a patch
for the surrogate pair issue quite a while ago, but it hasn't been applied. We
could submit a patch for this, too, as rejecting bad input is a feature that
any JSON decoder should have.
> "High ASCII" can be inserted into db but not retrieved
> ------------------------------------------------------
>
> Key: COUCHDB-345
> URL: https://issues.apache.org/jira/browse/COUCHDB-345
> Project: CouchDB
> Issue Type: Bug
> Affects Versions: 0.9
> Environment: OSX 10.5.6
> Reporter: Joan Touzet
> Attachments: badenc1.patch, badtext.tar.gz, enctest.zip,
> reject_invalid_utf8.patch
>
>
> It is possible to PUT/POST a document into CouchDB with a "high ASCII" value
> that cannot be retrieved. This results from not escaping a non-ASCII value
> into \u#### when PUT/POSTing the document.
> The attached sample code will recreate the problem using the hex value D8 (Ø)
> in a possibly unsavoury test string.
> Sample output against 0.9.0 is as follows:
> ================================================
> {
> "ok": true
> }
> {
> "id": "fail",
> "ok": true,
> "rev": "1-76726372"
> }
> {
> "error": "ucs",
> "reason": "{bad_utf8_character_code}"
> }
> ================================================
> Please note this defect turned up another problem, namely that the
> bad_utf8_character_code exception thrown by a design document attempting to
> map() the bad document caused Futon to fail silently in building the view,
> with no indication (except via debug log) that there was a failure. The log
> indicated two attempts to build the view, both failing, followed by an
> uncaught exception error for Futon.
> Based on this, there are likely other areas in the codebase that do not
> handle the bad_utf8_character_code exception correctly.
> My belief is that CouchDB shouldn't accept this input and should have
> rejected the PUT/POST, or should have escaped the input itself before the
> insertion.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.