Thanks Adam for finding this one. I ran into it a couple of times and I thought I'm crazy.
I think the view server should skip the invalid doc and print a warning in the log file with the doc id when it does. I believe a _bulk_doc request with a _deleted:true member still does allow removal of that doc, but I haven't tried in a while. Cheers Jan -- On 31 Aug 2010, at 07:25, Adam Kocoloski wrote: > It turns out that mochijson2 will incorrectly decode an invalid UTF-8 string > if the illegal byte sequence in the string occurs after an escaped character > (COUCHDB-875). This means that one can store documents which will never be > successfully retrieved or indexed in CouchDB 1.0. Moreover, once one of > these documents makes it into the DB a view build on that DB will never > complete. > > I wonder what we should do to circumvent that problem? At the very least it > might make sense for the view indexer to skip documents which contain invalid > UTF-8. > > Adam >
