It turns out that mochijson2 will incorrectly decode an invalid UTF-8 string if the illegal byte sequence in the string occurs after an escaped character (COUCHDB-875). This means that one can store documents which will never be successfully retrieved or indexed in CouchDB 1.0. Moreover, once one of these documents makes it into the DB a view build on that DB will never complete.
I wonder what we should do to circumvent that problem? At the very least it might make sense for the view indexer to skip documents which contain invalid UTF-8. Adam
