what to do about invalid UTF-8 in saved documents?

Adam Kocoloski Mon, 30 Aug 2010 22:27:21 -0700

It turns out that mochijson2 will incorrectly decode an invalid UTF-8 string if 
the illegal byte sequence in the string occurs after an escaped character 
(COUCHDB-875).  This means that one can store documents which will never be 
successfully retrieved or indexed in CouchDB 1.0.  Moreover, once one of these 
documents makes it into the DB a view build on that DB will never complete.


I wonder what we should do to circumvent that problem?  At the very least it 
might make sense for the view indexer to skip documents which contain invalid 
UTF-8.

Adam

what to do about invalid UTF-8 in saved documents?

Reply via email to