[ 
https://issues.apache.org/jira/browse/COUCHDB-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667130#action_12667130
 ] 

Jason Davies commented on COUCHDB-197:
--------------------------------------

I'd be interested if anyone else can reproduce any errors with the 
"replication" test when using the couch_redirects.3.diff patch above.  I can 
reproduce strange problems on a Debian box, which look like a possible bug in 
inets' http:request().  The test runs fine on my Mac OS X and Ubuntu boxes 
using the same versions of Erlang.  The Debian box is actually a VPS running 
with smp=8 and an average load of 7, so maybe the problem only occurs under 
high load.  I've tried disabling smp but the problem still occurs.  Here's a 
sample of the kind of error I'm getting:

{"error":"{badmatch,\n    {error,\n        {bad_return_value,\n            
{error,\n                {invalid_version,\n                    
\"\\\"_revs\\\":[\\\"1947022206\\\"]}}\\r\\n1\\r\\n]\\r\\n0\\r\\n\\r\\nHTTP/1.1\"}}}}}","reason":"[{couch_rep,do_http_request,4},\n
 {couch_rep,open_doc_revs,4},\n {couch_rep,'-enum_docs_parallel/3-fun-1-',3},\n 
{couch_rep,'-spawn_worker/3-fun-0-',3}]"}

Further notes:

1. It looks like this is happening in inets' http client when it is attempting 
to parse the HTTP version line, which is why I'm thinking there is some weird 
concurrency issue here.
2. The other thing is that this problem *only occurs* with the patch applied, 
which means there may be something wrong with http:request's handling of 
301/302 redirects in a pipelined situation.
3. When I comment out the {max_pipeline_length, 101} option, the errors no 
longer occur, indicating there may be a problem with pipelining.

> Replication renders CouchDB unresponsive.
> -----------------------------------------
>
>                 Key: COUCHDB-197
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-197
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>            Reporter: Maximillian Dornseif
>            Priority: Blocker
>             Fix For: 0.9
>
>         Attachments: couch_redirects.2.diff, couch_redirects.3.diff, 
> couch_redirects.diff, couch_tests.js.diff, push_replication_fix.diff
>
>
> I am quite sure this is not the same issue as in COUCHDB-193.
> Im trying to replicte a somewhat big database 
> {"doc_count":541394,"doc_del_count":265692,"update_seq":2118390,"purge_seq":0,"compact_running":false,"disk_size":16552608803}
>  to an other machine. 
> I started replication with this:
> send: 'POST /_replicate HTTP/1.1\r\nHost: 
> couchdb1.local.xxx:5984\r\nAccept-Encoding: identity\r\ncontent-length: 
> 90\r\ncontent-type: application/json\r\naccept: 
> application/json\r\nuser-agent: couchdb-python 0.5dev-r127\r\n\r\n'
> send: '{"source": "hulog_events", "target": 
> "http://couchdb2.local.xxx:5984/hulog_events"}'
> reply: ''
> connect: (couchdb1.local.hudora.biz, 5984)
> send: 'POST /_replicate HTTP/1.1\r\nHost: 
> couchdb1.local.xxxx:5984\r\nAccept-Encoding: identity\r\ncontent-length: 
> 90\r\ncontent-type: application/json\r\naccept: 
> application/json\r\nuser-agent: couchdb-python 0.5dev-r127\r\n\r\n'
> send: '{"source": "hulog_events", "target": 
> "http://couchdb2.local.xxxx:5984/hulog_events"}'
> (no reply so far)
> On the source server (couchdb1) I see following logentries:
> Mon, 05 Jan 2009 19:34:21 GMT] [info] [<0.12745.45>] 192.168.0.30 - - 'POST' 
> /_replicate 200
> [Mon, 05 Jan 2009 19:35:36 GMT] [info] [<0.107.0>] Compaction for db 
> "hulog_events_test" completed.
> [Mon, 05 Jan 2009 19:35:45 GMT] [info] [<0.12746.45>] 127.0.0.1 - - 'GET' 
> /hulog_events/ 200
> [Mon, 05 Jan 2009 19:35:46 GMT] [info] [<0.95.0>] Compaction for db "eap" 
> completed.
> [Mon, 05 Jan 2009 19:42:17 GMT] [error] [<0.12765.45>] ** Generic server 
> <0.12765.45> terminating 
> ** Last message in was {'EXIT',<0.12762.45>,
>                         {timeout,
>                          {gen_server,call,
>                           [<0.12768.45>,
>                            {write,
>                             <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2,
>                               109,0,0,0,7,112,114,111,100,117,99,116,109,
>                               0,0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,
>                               0,0,11,116,114,97,110,115,97,99,116,105,111,
>                               110,109,0,0,0,8,114,101,116,114,105,101,118,
>                               101,104,2,109,0,0,0,4,116,121,112,101,109,0,
>                               0,0,4,117,110,105,116,104,2,109,0,0,0,11,97,
>                               114,99,104,105,118,101,100,95,97,116,109,0,
>                               0,0,22,50,48,48,56,48,50,50,50,84,49,50,49,
>                               52,48,53,46,53,50,54,51,56,52,104,2,109,0,0,
>                               0,10,99,114,101,97,116,101,100,95,97,116,
>                               109,0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                               53,52,50,48,54,46,51,52,52,54,49,56,104,2,
>                               109,0,0,0,4,112,114,111,112,104,1,108,0,0,0,
>                               2,104,2,109,0,0,0,8,108,111,99,97,116,105,
>                               111,110,109,0,0,0,6,65,85,83,76,65,71,104,2,
>                               109,0,0,0,6,104,101,105,103,104,116,98,0,0,
>                               7,158,106,104,2,109,0,0,0,3,109,117,105,109,
>                               0,0,0,18,51,52,48,48,53,57,57,56,49,48,48,
>                               48,48,51,49,50,53,50,104,2,109,0,0,0,8,113,
>                               117,97,110,116,105,116,121,97,11,106,106>>}]}}}
> ** When Server state == {file_descriptor,prim_file,{#Port<0.904761>,24}}
> ** Reason for termination == 
> ** {timeout,{gen_server,call,
>                         [<0.12768.45>,
>                          {write,<<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,
>                                   2,109,0,0,0,7,112,114,111,100,117,99,116,
>                                   109,0,0,0,8,54,53,49,52,48,47,69,75,104,
>                                   2,109,0,0,0,11,116,114,97,110,115,97,99,
>                                   116,105,111,110,109,0,0,0,8,114,101,116,
>                                   114,105,101,118,101,104,2,109,0,0,0,4,
>                                   116,121,112,101,109,0,0,0,4,117,110,105,
>                                   116,104,2,109,0,0,0,11,97,114,99,104,105,
>                                   118,101,100,95,97,116,109,0,0,0,22,50,48,
>                                   48,56,48,50,50,50,84,49,50,49,52,48,53,
>                                   46,53,50,54,51,56,52,104,2,109,0,0,0,10,
>                                   99,114,101,97,116,101,100,95,97,116,109,
>                                   0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                                   53,52,50,48,54,46,51,52,52,54,49,56,104,
>                                   2,109,0,0,0,4,112,114,111,112,104,1,108,
>                                   0,0,0,2,104,2,109,0,0,0,8,108,111,99,97,
>                                   116,105,111,110,109,0,0,0,6,65,85,83,76,
>                                   65,71,104,2,109,0,0,0,6,104,101,105,103,
>                                   104,116,98,0,0,7,158,106,104,2,109,0,0,0,
>                                   3,109,117,105,109,0,0,0,18,51,52,48,48,
>                                   53,57,57,56,49,48,48,48,48,51,49,50,53,
>                                   50,104,2,109,0,0,0,8,113,117,97,110,116,
>                                   105,116,121,97,11,106,106>>}]}}
> [Mon, 05 Jan 2009 19:42:57 GMT] [error] [<0.12765.45>] {error_report,<0.22.0>,
>     {<0.12765.45>,crash_report,
>      [[{pid,<0.12765.45>},
>        {registered_name,[]},
>        {error_info,
>            {exit,
>                {timeout,
>                    {gen_server,call,
>                        [<0.12768.45>,
>                         {write,
>                             <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2,
>                               109,0,0,0,7,112,114,111,100,117,99,116,109,0,
>                               0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,0,
>                               0,11,116,114,97,110,115,97,99,116,105,111,
>                               110,109,0,0,0,8,114,101,116,114,105,101,118,
>                               101,104,2,109,0,0,0,4,116,121,112,101,109,0,
>                               0,0,4,117,110,105,116,104,2,109,0,0,0,11,97,
>                               114,99,104,105,118,101,100,95,97,116,109,0,0,
>                               0,22,50,48,48,56,48,50,50,50,84,49,50,49,52,
>                               48,53,46,53,50,54,51,56,52,104,2,109,0,0,0,
>                               10,99,114,101,97,116,101,100,95,97,116,109,0,
>                               0,0,22,50,48,48,55,49,49,50,56,84,49,53,52,
>                               50,48,54,46,51,52,52,54,49,56,104,2,109,0,0,
>                               0,4,112,114,111,112,104,1,108,0,0,0,2,104,2,
>                               109,0,0,0,8,108,111,99,97,116,105,111,110,
>                               109,0,0,0,6,65,85,83,76,65,71,104,2,109,0,0,
>                               0,6,104,101,105,103,104,116,98,0,0,7,158,106,
>                               104,2,109,0,0,0,3,109,117,105,109,0,0,0,18,
>                               51,52,48,48,53,57,57,56,49,48,48,48,48,51,49,
>                               50,53,50,104,2,109,0,0,0,8,113,117,97,110,
>                               116,105,116,121,97,11,106,106>>}]}},
>                [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}},
>        {initial_call,{couch_file,init,['Argument__1']}},
>        {ancestors,[<0.12762.45>]},
>        {messages,[]},
>        {links,[#Port<0.904761>]},
>        {dictionary,[]},
>        {trap_exit,true},
>        {status,running},
>        {heap_size,987},
>        {stack_size,23},
>        {reductions,836156}],
>       []]}}
> [Mon, 05 Jan 2009 19:43:02 GMT] [error] [<0.22399.43>] ** Generic server 
> <0.22399.43> terminating 
> ** Last message in was {'EXIT',<0.10848.41>,
>                         {timeout,
>                          {gen_server,call,
>                           [<0.12768.45>,
>                            {write,
>                             <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2,
>                               109,0,0,0,7,112,114,111,100,117,99,116,109,
>                               0,0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,
>                               0,0,11,116,114,97,110,115,97,99,116,105,111,
>                               110,109,0,0,0,8,114,101,116,114,105,101,118,
>                               101,104,2,109,0,0,0,4,116,121,112,101,109,0,
>                               0,0,4,117,110,105,116,104,2,109,0,0,0,11,97,
>                               114,99,104,105,118,101,100,95,97,116,109,0,
>                               0,0,22,50,48,48,56,48,50,50,50,84,49,50,49,
>                               52,48,53,46,53,50,54,51,56,52,104,2,109,0,0,
>                               0,10,99,114,101,97,116,101,100,95,97,116,
>                               109,0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                               53,52,50,48,54,46,51,52,52,54,49,56,104,2,
>                               109,0,0,0,4,112,114,111,112,104,1,108,0,0,0,
>                               2,104,2,109,0,0,0,8,108,111,99,97,116,105,
>                               111,110,109,0,0,0,6,65,85,83,76,65,71,104,2,
>                               109,0,0,0,6,104,101,105,103,104,116,98,0,0,
>                               7,158,106,104,2,109,0,0,0,3,109,117,105,109,
>                               0,0,0,18,51,52,48,48,53,57,57,56,49,48,48,
>                               48,48,51,49,50,53,50,104,2,109,0,0,0,8,113,
>                               117,97,110,116,105,116,121,97,11,106,106>>}]}}}
> ** When Server state == {file_descriptor,prim_file,{#Port<0.904494>,16}}
> ** Reason for termination == 
> ** {timeout,{gen_server,call,
>                         [<0.12768.45>,
>                          {write,<<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,
>                                   2,109,0,0,0,7,112,114,111,100,117,99,116,
>                                   109,0,0,0,8,54,53,49,52,48,47,69,75,104,
>                                   2,109,0,0,0,11,116,114,97,110,115,97,99,
>                                   116,105,111,110,109,0,0,0,8,114,101,116,
>                                   114,105,101,118,101,104,2,109,0,0,0,4,
>                                   116,121,112,101,109,0,0,0,4,117,110,105,
>                                   116,104,2,109,0,0,0,11,97,114,99,104,105,
>                                   118,101,100,95,97,116,109,0,0,0,22,50,48,
>                                   48,56,48,50,50,50,84,49,50,49,52,48,53,
>                                   46,53,50,54,51,56,52,104,2,109,0,0,0,10,
>                                   99,114,101,97,116,101,100,95,97,116,109,
>                                   0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                                   53,52,50,48,54,46,51,52,52,54,49,56,104,
>                                   2,109,0,0,0,4,112,114,111,112,104,1,108,
>                                   0,0,0,2,104,2,109,0,0,0,8,108,111,99,97,
>                                   116,105,111,110,109,0,0,0,6,65,85,83,76,
>                                   65,71,104,2,109,0,0,0,6,104,101,105,103,
>                                   104,116,98,0,0,7,158,106,104,2,109,0,0,0,
>                                   3,109,117,105,109,0,0,0,18,51,52,48,48,
>                                   53,57,57,56,49,48,48,48,48,51,49,50,53,
>                                   50,104,2,109,0,0,0,8,113,117,97,110,116,
>                                   105,116,121,97,11,106,106>>}]}}
> [Mon, 05 Jan 2009 19:43:28 GMT] [error] [<0.22399.43>] {error_report,<0.22.0>,
>     {<0.22399.43>,crash_report,
>      [[{pid,<0.22399.43>},
>        {registered_name,[]},
>        {error_info,
>            {exit,
>                {timeout,
>                    {gen_server,call,
>                        [<0.12768.45>,
>                         {write,
>                             <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2,
>                               109,0,0,0,7,112,114,111,100,117,99,116,109,0,
>                               0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,0,
>                               0,11,116,114,97,110,115,97,99,116,105,111,
>                               110,109,0,0,0,8,114,101,116,114,105,101,118,
>                               101,104,2,109,0,0,0,4,116,121,112,101,109,0,
>                               0,0,4,117,110,105,116,104,2,109,0,0,0,11,97,
>                               114,99,104,105,118,101,100,95,97,116,109,0,0,
>                               0,22,50,48,48,56,48,50,50,50,84,49,50,49,52,
>                               48,53,46,53,50,54,51,56,52,104,2,109,0,0,0,
>                               10,99,114,101,97,116,101,100,95,97,116,109,0,
>                               0,0,22,50,48,48,55,49,49,50,56,84,49,53,52,
>                               50,48,54,46,51,52,52,54,49,56,104,2,109,0,0,
>                               0,4,112,114,111,112,104,1,108,0,0,0,2,104,2,
>                               109,0,0,0,8,108,111,99,97,116,105,111,110,
>                               109,0,0,0,6,65,85,83,76,65,71,104,2,109,0,0,
>                               0,6,104,101,105,103,104,116,98,0,0,7,158,106,
>                               104,2,109,0,0,0,3,109,117,105,109,0,0,0,18,
>                               51,52,48,48,53,57,57,56,49,48,48,48,48,51,49,
>                               50,53,50,104,2,109,0,0,0,8,113,117,97,110,
>                               116,105,116,121,97,11,106,106>>}]}},
>                [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}},
>        {initial_call,{couch_file,init,['Argument__1']}},
>        {ancestors,
>            [<0.10848.41>,<0.10847.41>,couch_server,couch_primary_services,
>             couch_server_sup,<0.1.0>]},
>        {messages,
>            [{'DOWN',#Ref<0.0.81.132266>,process,<0.10847.41>,
>                 {timeout,
>                     {gen_server,call,
>                         [<0.12768.45>,
>                          {write,
>                              <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,
>                                2,109,0,0,0,7,112,114,111,100,117,99,116,
>                                109,0,0,0,8,54,53,49,52,48,47,69,75,104,
>                                2,109,0,0,0,11,116,114,97,110,115,97,99,
>                                116,105,111,110,109,0,0,0,8,114,101,116,
>                                114,105,101,118,101,104,2,109,0,0,0,4,
>                                116,121,112,101,109,0,0,0,4,117,110,105,
>                                116,104,2,109,0,0,0,11,97,114,99,104,105,
>                                118,101,100,95,97,116,109,0,0,0,22,50,48,
>                                48,56,48,50,50,50,84,49,50,49,52,48,53,
>                                46,53,50,54,51,56,52,104,2,109,0,0,0,10,
>                                99,114,101,97,116,101,100,95,97,116,109,
>                                0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                                53,52,50,48,54,46,51,52,52,54,49,56,104,
>                                2,109,0,0,0,4,112,114,111,112,104,1,108,
>                                0,0,0,2,104,2,109,0,0,0,8,108,111,99,97,
>                                116,105,111,110,109,0,0,0,6,65,85,83,76,
>                                65,71,104,2,109,0,0,0,6,104,101,105,103,
>                                104,116,98,0,0,7,158,106,104,2,109,0,0,0,
>                                3,109,117,105,109,0,0,0,18,51,52,48,48,
>                                53,57,57,56,49,48,48,48,48,51,49,50,53,
>                                50,104,2,109,0,0,0,8,113,117,97,110,116,
>                                105,116,121,97,11,106,106>>}]}}}]},
>        {links,[#Port<0.904494>]},
>        {dictionary,[{<0.10847.41>,{#Ref<0.0.81.132266>,1}}]},
>        {trap_exit,true},
>        {status,running},
>        {heap_size,987},
>        {stack_size,23},
>        {reductions,5627554}],
>       []]}}
> (and nothing further)
> I still can access couchdb1 (the source) but every trivial request takes 
> exactly 5015ms:
> balancer:/filespace/couchdb/log# time curl -i http://127.0.0.1:5984/
> HTTP/1.1 200 OK
> Server: CouchDB/0.9.0a731357-incubating (Erlang OTP/R12B)
> Date: Mon, 05 Jan 2009 20:45:46 GMT
> Content-Type: text/plain;charset=utf-8
> Content-Length: 102
> Cache-Control: must-revalidate
> {"couchdb":"Welcome","version":"0.9.0a731357-incubating","start_time":"Sun, 
> 04 Jan 2009 21:43:13 GMT"}
> real  0m5.015s
> user  0m0.008s
> sys   0m0.000s
> For these accesses no log entries  on couchdb1 are created.
> Meanwhile on the destination server (couchdb2) I can see lot of activity:
> [Mon, 05 Jan 2009 20:47:58 GMT] [info] [<0.19601.5>] 172.28.4.107 - - 'POST' 
> /hulog_events/_missing_revs 200
> ... 40 lines...
> [Mon, 05 Jan 2009 20:47:58 GMT] [info] [<0.19644.5>] 172.28.4.107 - - 'POST' 
> /hulog_events/_missing_revs 200
> [Mon, 05 Jan 2009 20:48:13 GMT] [info] [<0.19652.5>] 172.28.4.107 - - 'POST' 
> /hulog_events/_missing_revs 200
> ... ca 200 lines
> [Mon, 05 Jan 2009 20:48:13 GMT] [info] [<0.19744.5>] 172.28.4.107 - - 'POST' 
> /hulog_events/_missing_revs 200
> [Mon, 05 Jan 2009 20:48:28 GMT] [info] [<0.19747.5>] 172.28.4.107 - - 'POST' 
> /hulog_events/_missing_revs 200
> ... ca 200 lines
> [Mon, 05 Jan 2009 20:48:28 GMT] [info] [<0.19844.5>] 172.28.4.107 - - 'POST' 
> /hulog_events/_missing_revs 200
> [Mon, 05 Jan 2009 20:48:43 GMT] [info] [<0.19944.5>] 172.28.4.107 - - 'POST' 
> /hulog_events/_missing_revs 200
> ... ca 200 lines
> [Mon, 05 Jan 2009 20:48:58 GMT] [info] [<0.19948.5>] 172.28.4.107 - - 'POST' 
> /hulog_events/_missing_revs 200
> ... ca 200 lines
> [Mon, 05 Jan 2009 20:48:58 GMT] [info] [<0.20044.5>] 172.28.4.107 - - 'POST' 
> /hulog_events/_missing_revs 200
> [Mon, 05 Jan 2009 20:49:13 GMT] [info] [<0.20045.5>] 172.28.4.107 - - 'POST' 
> /hulog_events/_missing_revs 200
> But the number of documents on the destination servers hasnt been incerasing 
> in the meantime:
> {"db_name":"hulog_events","doc_count":25926,"doc_del_count":10074,"update_seq":36000,"purge_seq":0,"compact_running":false,"disk_size":21927524}
> couchdb1 is CouchDB 0.9.0a731357-incubating
> couchdb2 is CouchDB 0.9.0a730405-incubating

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to