Re: chunked encoding problem ? - error messages from curl as well as lucene

Damien Katz Wed, 01 Jul 2009 09:21:33 -0700

Nitin, I would try to purge the bad document, using the _purge api(deleting the document can still cause problems as we'll keep around adeletion stub with the bad id), then things should be fixed. Butyou'll have to know the rev id of the document to use it, which mightbe hard to get via http.


Purge:


POST /db/_purge
{"thedocid": "therevid"}

Unless somehow the file got corrupted, this is definitely a CouchDBbug, we shouldn't accept a string we can't later return to the caller.Can you create a bug report? Adding failing test case would be thebest, but attaching the bad string will also do.


-Damien


On Jun 30, 2009, at 2:47 PM, Adam Kocoloski wrote:

Hi Nitin, the specific bug I fixed only affected Unicode charactersoutside the Basic Multilingual Plane. CouchDB would happily acceptthose characters in raw UTF-8 format, and would serve them back tothe user escaped as UTF-16 surrogate pairs. However, CouchDB wouldnot allow users to upload documents where the characters werealready escaped. That's been fixed in 0.9.1
It looks like you've got a different problem. It might be the casethat we are too permissive in what we accept as raw UTF-8 in theupload. I don't know. Best,
Adam

On Jun 30, 2009, at 2:18 PM, Nitin Borwankar wrote:
Hi Damien,

Thanks for that tip.

Turns out I had non-UTF-8 data

adolfo.steiger-gar%E7%E3o:

- not sure how it managed to get into the db.

This is probably confusing the chunk termination.

How did Couch let this data in ?  I uploaded via Python httplib - not
couchdb-python.  Is this a bug - the one that is fixed in 0.9.1?

Nitin

37% of all statistics are made up on the spot
-------------------------------------------------------------------------------------
Nitin Borwankar
[email protected]
On Tue, Jun 30, 2009 at 8:58 AM, Damien Katz <[email protected]>wrote:
This might be the json encoding issue that Adam fixed.
The 0.9.x branch, which is soon to be 0.9.1, fixes that issue. Trybuilding
and installing from the branch and see if that fixes the problem:
svn co http://svn.apache.org/repos/asf/couchdb/branches/0.9.x/

-Damien



On Jun 30, 2009, at 12:15 AM, Nitin Borwankar wrote:
Oh and when I use Futon and try to browse the docs around wherecurl
gives
an error, when I hit the page containing the records around theerror
Futon
just spins and doesn't render the page.

Data corruption?

Nitin

37% of all statistics are made up on the spot

-------------------------------------------------------------------------------------
Nitin Borwankar
[email protected]


On Mon, Jun 29, 2009 at 9:11 PM, Nitin Borwankar <[email protected]
wrote:
Hi,
I uploaded about 11K + docs total 230MB or so of data to a 0.9instance
on
Ubuntu.
Db name is 'plist'

curl http://localhost:5984/plist gives
{"db_name":"plist","doc_count":11036,"doc_del_count":0,"update_seq":11036,"purge_seq":0,
"compact_running":false,"disk_size":243325178,"instance_start_time":"1246228896723181"}
suggesting a non-corrupt db

curl http://localhost:5984/plist/_all_docs gives

{"id":"adnanmoh","key":"adnanmoh","value":{"rev":"1-663736558"}},
{"id":"adnen.chockri","key":"adnen.chockri","value":{"rev":"1-1209124545"}},
curl: (56) Received problem 2 in the chunky
parser <<--------- notecurl
error
{"id":"ado.adamu","key":"ado.adamu","value":{"rev":"1-4226951654"}}
suggesting a chunked data transfer error


couchdb-lucene error message in couchdb.stderr reads

[...]

[couchdb-lucene] INFO Indexing plist from scratch.
[couchdb-lucene] ERROR Error updating index.
java.io.IOException: CRLF expected at end of chunk: 83/101
at
org.apache.commons.httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java:207)
at
org.apache.commons.httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java:219)
at
org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176)
at
org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196)
at
org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:369)
at
org.apache.commons.httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346)
at java.io.FilterInputStream.close(FilterInputStream.java:159)
at
org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:194)
at
org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)
at
com.github.rnewson.couchdb.lucene.Database.execute(Database.java:141)at com.github.rnewson.couchdb.lucene.Database.get(Database.java:107)
at
com.github.rnewson.couchdb.lucene.Database.getAllDocsBySeq(Database.java:82)
at
com.github.rnewson.couchdb.lucene.Index$Indexer.updateDatabase(Index.java:229)
at
com.github.rnewson.couchdb.lucene.Index$Indexer.updateIndex(Index.java:178)at com.github.rnewson.couchdb.lucene.Index$Indexer.run(Index.java:90)
at java.lang.Thread.run(Thread.java:595)


suggesting a chunking problem again.

Who is creating this problem - my data?  CouchDB chunking ?

Help?



37% of all statistics are made up on the spot


-------------------------------------------------------------------------------------
Nitin Borwankar
[email protected]

Re: chunked encoding problem ? - error messages from curl as well as lucene

Reply via email to