We'd love to hear what you come up with and also to solve any
problems you might encounter on your way. Please let us know. Please
note that CouchDB at this point is not optimised. We are still in
the 'getting it right' phase before we come to the 'getting it
fast'. That said, CouchDB is plenty fast already, but there is also
the potential to greatly speed up things.
So I'm trying a smaller version of this first (9 million records), and
I've hit a snag. I have some rather simple python code to read from
Postgres and write to couchdb (that uses couchdb-python, where 'db' is
a couchdb.client.Database object):
chunker = IteratorChunker(get_stuff())
while not chunker.done:
print "fetching"
chunk = chunker.next_chunk(1000)
if chunk:
print "Adding %d items, starting with %s" %
(len(chunk),chunk[0]['_id'])
db.update(chunk)
db.update(docs) (see <http://code.google.com/p/couchdb-python/source/browse/trunk/couchdb/client.py
>, line 360) uses the bulk API, like:
data = self.resource.post('_bulk_docs', content={'docs':
documents})
At apparently random points throughout this process, but almost always
before 15,000 records or so, the process dies with an exception, the
tail end of which looks like:
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/httplib.py", line 707, in send
self.sock.sendall(str)
File "<string>", line 1, in sendall
socket.error: (54, 'Connection reset by peer')
If I have Futon up while it's running, I occasionally get a Javascript
error along the lines of "killed" (reproducing it is difficult) at the
same time.
I could have it catch the reset connection and re-try, but why would
this be happening?