Thanks Adam for finding this one. I ran into it a couple of times and I thought 
I'm crazy.

I think the view server should skip the invalid doc and print a warning in the 
log file with the doc id when it does.

I believe a _bulk_doc request with a _deleted:true member still does allow 
removal of that doc, but I haven't tried in a while.

Cheers
Jan
-- 


On 31 Aug 2010, at 07:25, Adam Kocoloski wrote:

> It turns out that mochijson2 will incorrectly decode an invalid UTF-8 string 
> if the illegal byte sequence in the string occurs after an escaped character 
> (COUCHDB-875).  This means that one can store documents which will never be 
> successfully retrieved or indexed in CouchDB 1.0.  Moreover, once one of 
> these documents makes it into the DB a view build on that DB will never 
> complete.
> 
> I wonder what we should do to circumvent that problem?  At the very least it 
> might make sense for the view indexer to skip documents which contain invalid 
> UTF-8.
> 
> Adam
> 

Reply via email to