Replication w/ Large Attachments Fails
--------------------------------------

                 Key: COUCHDB-270
                 URL: https://issues.apache.org/jira/browse/COUCHDB-270
             Project: CouchDB
          Issue Type: Bug
          Components: Database Core
    Affects Versions: 0.9
         Environment: Apache CouchDB 0.9.0a748379
            Reporter: Jeff Hinrichs


Attempting to replicate a database with largish attachments (<= ~18MB of 
attachments in a doc, less thatn 200 docs)  from one machine to another fails 
consistently and at the same point.

Scenario:
Both servers are running from HEAD and I've been tracking for some time.  This 
problem has been around as long as I've been using couch.

Machine A holds the original database, Machine B is the server that is doing a 
PULL replication

During the replication, Machine A starts showing the following sporadically in 
the log:
[Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5902.3>] 'GET'
/delasco-invoices/INV00652429?revs=true&attachments=true&latest=true&open_revs=["425644723"]
{1,

                            1}
Headers: [{'Host',"192.168.2.52:5984"}]

[Fri, 27 Feb 2009 14:02:48 GMT] [error] [<0.5901.3>] Uncaught error in
HTTP request: {exit,normal}

[Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5901.3>] Stacktrace:
[{mochiweb_request,send,2},
            {couch_httpd,send_chunk,2},
            {couch_httpd_db,db_doc_req,3},
            {couch_httpd_db,do_db_req,2},
            {couch_httpd,handle_request,3},
            {mochiweb_http,headers,5},
            {proc_lib,init_p,5}]

[Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5901.3>] HTTPd 500 error response:
 {"error":"error","reason":"normal"}

As the replication continues, the frequency of these error "Uncaught error in 
HTTP request: {exit,normal}"  increase.  Until the error is being constantly 
repeated.  Then Machine B stops sending requests, no more log output, no 
errors, the last thing in Machine B's log file is:

[Fri, 27 Feb 2009 14:03:24 GMT] [info] [<0.20893.1>] retrying
couch_rep HTTP get request due to {error, req_timedout}: [104,116,
                                                                  116,112,58,
                                                                  47,47,49,
                                                                  57,50,46,
                                                                  49,54,56,
                                                                  46,50,46,
                                                                  53,50,58,
                                                                  53,57,56,
                                                                  52,47,100,
                                                                  101,108,97,
                                                                  115,99,111,
                                                                  45,105,110,
                                                                  118,111,
                                                                  105,99,101,
                                                                  115,47,73,
                                                                  78,86,48,
                                                                  48,54,53,
                                                                  50,49,51,
                                                                  56,63,114,
                                                                  101,118,
                                                                  115,61,116,
                                                                  114,117,
                                                                  101,38,97,
                                                                  116,116,97,
                                                                  99,104,109,
                                                                  101,110,
                                                                  116,115,61,
                                                                  116,114,
                                                                  117,101,38,
                                                                  108,97,116,
                                                                  101,115,
                                                                  116,61,116,
                                                                  114,117,
                                                                  101,38,111,
                                                                  112,101,
                                                                  110,95,114,
                                                                  101,118,
                                                                  115,61,91,
                                                                  34,

<<"3070455362">>,
                                                                  34,93]

A request for status from the couchdb init.d script returns nothing and 
checking the processes returns:

(demo-couchdb)j...@mars:~/projects/venvs/demo-couchdb/src$ ps ax|grep cou
29281 pts/2    S+     0:00 grep cou
(demo-couchdb)j...@mars:~/projects/venvs/demo-couchdb/src$ ps ax|grep beam
29305 pts/2    R+     0:00 grep beam

In fact, couch has gone away completely on Machine B.  In fact, couch's death 
is so quick it can't even say why.

Attempts to incrementally replicate after the first failure die at exactly the 
same place.

I can replicate this same database on the same machine from one database to 
another without issue.  I can dump and reload the database with no problems.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to