It turns out that mochijson2 will incorrectly decode an invalid UTF-8 string if 
the illegal byte sequence in the string occurs after an escaped character 
(COUCHDB-875).  This means that one can store documents which will never be 
successfully retrieved or indexed in CouchDB 1.0.  Moreover, once one of these 
documents makes it into the DB a view build on that DB will never complete.

I wonder what we should do to circumvent that problem?  At the very least it 
might make sense for the view indexer to skip documents which contain invalid 
UTF-8.

Adam

Reply via email to