[ 
https://issues.apache.org/jira/browse/COUCHDB-2833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941124#comment-14941124
 ] 

Nick Vatamaniuc commented on COUCHDB-2833:
------------------------------------------

This is the output log from the failing test case. It has added debug 
statements throughout the replicator and the test.

{code}
%% couch_replicator_large_atts_tests.erl changes
%%
%% -define(ATT_SIZE_1, 65537).
%% -define(ATT_SIZE_2, 1).
%% -define(DOCS_COUNT, 2).
%% ...
%% large_atts_test_() ->
%%    Pairs = [
%%             %{local, local},
%%             {local, remote}
%%             %{remote, local},
%%             %{remote, remote}
%%           ],


$ rebar -r eunit apps=couch_replicator tests=large_atts_test_ 
skip_deps=couch_epi

==> couch_replicator (eunit)
Compiled test/couch_replicator_test_helper.erl
/Users/nvatama/asf/couchdb/src/couch_replicator/test/couch_replicator_large_atts_tests.erl:124:
 Warning: variable 'Res' is unused
Compiled test/couch_replicator_large_atts_tests.erl
Compiled test/couch_replicator_missing_stubs_tests.erl
    Running test function(s):
      couch_replicator_large_atts_tests:large_atts_test_/0
======================== EUnit ========================

01:12:38.699 [notice] config: [attachments] compressible_types set to text/* 
for reason nil
01:12:38.699 [info] Application couch_replicator started on node nonode@nohost
test/couch_replicator_large_atts_tests.erl:81:<0.166.0>: $xxxxxx From local 
<<"eunit-test-db-1443762758699299">>  ==> remote {remote,
                                                                     
<<"eunit-test-db-1443762758720227">>}

test/couch_replicator_large_atts_tests.erl:98:<0.166.0>:
@@@@@@@ replicating <<"eunit-test-db-1443762758699299">> > 
<<"http://127.0.0.1:58985/eunit-test-db-1443762758720227";>>

couch_replicator_large_atts_tests:91: 
should_populate_source...test/couch_replicator_test_helper.erl:98:<0.446.0>:
@@@@@@@@@ Replicate <<"eunit-test-db-1443762758699299">> => 
<<"http://127.0.0.1:58985/eunit-test-db-1443762758720227";>>


test/couch_replicator_test_helper.erl:106:<0.446.0>: @@@@ Rep 
{rep,{"9232f5f1b3ccb9fe9d7b781873a147cf",[]},
              <<"eunit-test-db-1443762758699299">>,
              {httpdb,"http://127.0.0.1:58985/eunit-test-db-1443762758720227/";,
                      nil,
                      [{"Accept","application/json"},
                       {"User-Agent","CouchDB-Replicator/4ca9e41"}],
                      30000,
                      [{socket_options,[{keepalive,true},{nodelay,false}]}],
                      10,250,nil,20},
              [{checkpoint_interval,5000},
               {connection_timeout,30000},
               {http_connections,20},
               {retries,10},
               {socket_options,[{keepalive,true},{nodelay,false}]},
               {use_checkpoints,true},
               {worker_batch_size,500},
               {worker_processes,4}],
              {user_ctx,null,[<<"_admin">>],undefined},
              db,nil,null,null}

01:13:08.813 [notice] !!!!!! async_replicate "9232f5f1b3ccb9fe9d7b781873a147cf" 
[] <<"eunit-test-db-1443762758699299">> -> 
{httpdb,"http://127.0.0.1:58985/eunit-test-db-1443762758720227/",nil,[{"Accept","application/json"},{"User-Agent","CouchDB-Replicator/4ca9e41"}],30000,[{socket_options,[{keepalive,true},{nodelay,false}]}],10,250,nil,20}
 tout: 30000
01:13:08.813 [notice] starting new replication 
`9232f5f1b3ccb9fe9d7b781873a147cf` at <0.693.0> 
(`eunit-test-db-1443762758699299` -> 
`http://127.0.0.1:58985/eunit-test-db-1443762758720227/`)

=PROGRESS REPORT==== 2-Oct-2015::01:13:08 ===
          supervisor: {local,couch_replicator_job_sup}
             started: [{pid,<0.693.0>},
                       {name,"9232f5f1b3ccb9fe9d7b781873a147cf"},
                       {mfargs,
                        {gen_server,start_link,
                         [couch_replicator,
                          {rep,
                           {"9232f5f1b3ccb9fe9d7b781873a147cf",[]},
                           <<"eunit-test-db-1443762758699299">>,
                           {httpdb,
                            
"http://127.0.0.1:58985/eunit-test-db-1443762758720227/";,
                            nil,
                            [{"Accept","application/json"},
                             {"User-Agent","CouchDB-Replicator/4ca9e41"}],
                            30000,
                            [{socket_options,
                              [{keepalive,true},{nodelay,false}]}],
                            10,250,nil,20},
                           [{checkpoint_interval,5000},
                            {connection_timeout,30000},
                            {http_connections,20},
                            {retries,10},
                            {socket_options,
                             [{keepalive,true},{nodelay,false}]},
                            {use_checkpoints,true},
                            {worker_batch_size,500},
                            {worker_processes,4}],
                           {user_ctx,null,[<<"_admin">>],undefined},
                           db,nil,null,null},
                          [{timeout,30000}]]}},
                       {restart_type,temporary},
                       {shutdown,250},
                       {child_type,worker}]

=PROGRESS REPORT==== 2-Oct-2015::01:13:08 ===
          supervisor: {local,inet_gethost_native_sup}
             started: [{pid,<0.697.0>},{mfa,{inet_gethost_native,init,[[]]}}]

=PROGRESS REPORT==== 2-Oct-2015::01:13:08 ===
          supervisor: {local,kernel_safe_sup}
             started: [{pid,<0.696.0>},
                       {name,inet_gethost_native_sup},
                       {mfargs,{inet_gethost_native,start_link,[]}},
                       {restart_type,temporary},
                       {shutdown,1000},
                       {child_type,worker}]
01:13:08.879 [notice] 127.0.0.1 - - GET /eunit-test-db-1443762758720227/ 200
01:13:08.881 [notice] 127.0.0.1 - - GET /eunit-test-db-1443762758720227/ 200
01:13:08.882 [notice] 127.0.0.1 - - GET 
/eunit-test-db-1443762758720227/_local/9232f5f1b3ccb9fe9d7b781873a147cf 404
01:13:08.882 [notice] 127.0.0.1 - - GET 
/eunit-test-db-1443762758720227/_local/c1b4431ba39c56ce3667d7e4e24deb24 404
01:13:08.883 [notice] 127.0.0.1 - - GET 
/eunit-test-db-1443762758720227/_local/ce13e7d20a5667809215382ecddd4e6a 404
01:13:08.891 [notice] Replication `"9232f5f1b3ccb9fe9d7b781873a147cf"` is using:
        4 worker processes
        a worker batch size of 500
        20 HTTP connections
        a connection timeout of 30000 milliseconds
        10 retries per request
        socket options are: [{keepalive,true},{nodelay,false}]
01:13:08.891 [notice] Worker pids are: [<0.704.0>,<0.705.0>,<0.706.0>,<0.707.0>]
01:13:08.892 [notice] 127.0.0.1 - - POST 
/eunit-test-db-1443762758720227/_revs_diff 200
01:13:08.892 [notice] Worker flushing doc with attachments local_doc_handler 
<0.704.0> 
{doc,<<"doc1">>,{1,[<<213,87,17,46,228,19,61,125,81,66,181,76,27,94,144,45>>]},{[]},[{att,<<"att1">>,<<"text/plain">>,65574,65536,<<234,108,210,5,238,127,62,203,190,26,30,150,49,168,204,53>>,1,{<0.440.0>,[{4240,4000},{8245,12000},{20251,4000},{24256,12000},{36263,4000},{40268,12000},{52275,17574}]},gzip},{att,<<"att2">>,<<"app/binary">>,65537,65537,<<203,17,75,208,5,141,155,169,198,203,140,229,153,12,127,210>>,1,{<0.440.0>,[{69858,8192},{78056,8192},{86254,8192},{94452,8192},{102650,8192},{110848,8192},{119046,8192},{127244,8192},{135442,1}]},identity}],false,[]}
 []
01:13:08.927 [notice] 127.0.0.1 - - PUT 
/eunit-test-db-1443762758720227/doc1?new_edits=false 201
01:13:08.928 [notice] Worker flushing doc with attachments local_doc_handler 
<0.704.0> 
{doc,<<"doc2">>,{1,[<<205,189,248,163,192,229,245,166,243,179,143,177,110,124,227,33>>]},{[]},[{att,<<"att1">>,<<"text/plain">>,65574,65536,<<98,82,170,248,169,62,233,38,88,5,93,115,255,138,151,22>>,1,{<0.440.0>,[{135447,4000},{139452,12000},{151458,4000},{155463,12000},{167470,4000},{171475,12000},{183482,17574}]},gzip},{att,<<"att2">>,<<"app/binary">>,65537,65537,<<64,190,238,174,102,16,103,67,66,184,107,184,25,130,126,183>>,1,{<0.440.0>,[{201065,8192},{209263,8192},{217461,8192},{225659,8192},{233857,8192},{242055,8192},{250253,8192},{258451,8192},{266649,1}]},identity}],false,[]}
 []
01:13:08.929 [error] Replicator, request PUT to 
"http://127.0.0.1:58985/eunit-test-db-1443762758720227/doc2?new_edits=false"; 
failed due to error {error,connection_closing}
01:13:08.929 [error] Worker <0.704.0> died with reason: 
{http_request_failed,"PUT","http://127.0.0.1:58985/eunit-test-db-1443762758720227/doc2?new_edits=false",{error,{error,connection_closing}}}
test/couch_replicator_test_helper.erl:116:<0.446.0>: @@@@@@ Replication  
{worker_died,<0.704.0>,
                                                    {http_request_failed,
                                                     "PUT",
                                                     
"http://127.0.0.1:58985/eunit-test-db-1443762758720227/doc2?new_edits=false";,
                                                     {error,
                                                      {error,
                                                       connection_closing}}}}

01:13:08.929 [error] Replication `9232f5f1b3ccb9fe9d7b781873a147cf` 
(`eunit-test-db-1443762758699299` -> 
`http://127.0.0.1:58985/eunit-test-db-1443762758720227/`) failed: 
{worker_died,<0.704.0>,
             {http_request_failed,"PUT",
                                  
"http://127.0.0.1:58985/eunit-test-db-1443762758720227/doc2?new_edits=false";,
                                  {error,{error,connection_closing}}}}
01:13:08.931 [error] gen_server <0.693.0> terminated with reason: 
{worker_died,<0.704.0>,{http_request_failed,"PUT","http://127.0.0.1:58985/eunit-test-db-1443762758720227/doc2?new_edits=false",{error,{error,connection_closing}}}}
[30.135 s] ok
    couch_replicator_large_atts_tests:106: 
should_compare_databases...test/couch_replicator_test_helper.erl:15:<0.446.0>:
 

=CRASH REPORT==== 2-Oct-2015::01:13:08 ===
  crasher:
    initial call: couch_replicator:init/1
    pid: <0.693.0>
    registered_name: []
    exception exit: {worker_died,<0.704.0>,
                        {http_request_failed,"PUT",
                            
"http://127.0.0.1:58985/eunit-test-db-1443762758720227/doc2?new_edits=false";,
                            {error,{error,connection_closing}}}}
      in function  gen_server:terminate/7 (gen_server.erl, line 804)
    ancestors: [couch_replicator_job_sup,couch_replicator_sup,<0.424.0>]
    messages: [{'EXIT',<0.700.0>,normal}]
    links: [<0.438.0>,<0.703.0>]
    dictionary: [{task_status_props,
                      [{changes_pending,2},
                       {checkpoint_interval,5000},
                       {checkpointed_source_seq,0},
                       {continuous,false},
                       {database,null},
                       {doc_id,null},
                       {doc_write_failures,0},
                       {docs_read,0},
                       {docs_written,0},
                       {missing_revisions_found,0},
                       {replication_id,<<"9232f5f1b3ccb9fe9d7b781873a147cf">>},
                       {revisions_checked,0},
                       {source,<<"eunit-test-db-1443762758699299">>},
                       {source_seq,0},
                       {started_on,1443762788},
                       {target,
                           
<<"http://127.0.0.1:58985/eunit-test-db-1443762758720227/";>>},
                       {through_seq,0},
                       {type,replication},
                       {updated_on,1443762788},
                       {user,null}]},
                  {ibrowse_stream_status,init},
                  {pending_count_state,{{1443,762788,891640},2}},
                  {task_status_update,{{0,0,0},1000000}}]
    trap_exit: true
    status: running
    heap_size: 4185
    stack_size: 27
    reductions: 10372
  neighbours:
    neighbour: [{pid,<0.703.0>},
                  {registered_name,[]},
                  {initial_call,{erlang,apply,2}},
                  {current_function,
                      {couch_replicator,changes_manager_loop_open,4}},
                  {ancestors,[]},
                  {messages,[]},
                  {links,[<0.693.0>]},
                  {dictionary,[]},
                  {trap_exit,false},
                  {status,waiting},
                  {heap_size,233},
                  {stack_size,8},
                  {reductions,55}]

=SUPERVISOR REPORT==== 2-Oct-2015::01:13:08 ===
     Supervisor: {local,couch_replicator_job_sup}
     Context:    child_terminated
     Reason:     {worker_died,<0.704.0>,
                     {http_request_failed,"PUT",
                         
"http://127.0.0.1:58985/eunit-test-db-1443762758720227/doc2?new_edits=false";,
                         {error,{error,connection_closing}}}}
     Offender:   [{pid,<0.693.0>},
                  {name,"9232f5f1b3ccb9fe9d7b781873a147cf"},
                  {mfargs,{gen_server,start_link,undefined}},
                  {restart_type,temporary},
                  {shutdown,250},
                  {child_type,worker}]

*failed*
in function couch_replicator_test_helper:'-compare_dbs/2-fun-17-'/5 
(test/couch_replicator_test_helper.erl, line 16)
in call from couch_btree:stream_kv_node2/8 (src/couch_btree.erl, line 783)
in call from couch_btree:fold/4 (src/couch_btree.erl, line 217)
in call from couch_db:enum_docs/5 (src/couch_db.erl, line 1251)
in call from couch_replicator_test_helper:compare_dbs/2 
(test/couch_replicator_test_helper.erl, line 61)
**error:{badmatch,{not_found,missing}}
01:13:08.943 [error] CRASH REPORT Process <0.693.0> with 1 neighbours exited 
with reason: 
{worker_died,<0.704.0>,{http_request_failed,"PUT","http://127.0.0.1:58985/eunit-test-db-1443762758720227/doc2?new_edits=false",{error,{error,connection_closing}}}}
 in gen_server:terminate/7 line 804


    [done in 30.198 s]
01:13:08.943 [error] Supervisor couch_replicator_job_sup had child 
"9232f5f1b3ccb9fe9d7b781873a147cf" started with 
{gen_server,start_link,undefined} at <0.693.0> exit with reason 
{worker_died,<0.704.0>,{http_request_failed,"PUT","http://127.0.0.1:58985/eunit-test-db-1443762758720227/doc2?new_edits=false",{error,{error,connection_closing}}}}
 in context child_terminated
01:13:08.944 [info] Application couch_replicator exited with reason: stopped
01:13:08.945 [info] Application mem3 exited with reason: stopped
01:13:08.945 [info] Application rexi exited with reason: stopped
01:13:08.946 [info] Application couch exited with reason: stopped
01:13:08.946 [info] Application couch_stats exited with reason: stopped
01:13:08.947 [info] Application folsom exited with reason: stopped
01:13:08.947 [info] Application ioq exited with reason: stopped
01:13:08.947 [info] Application b64url exited with reason: stopped
01:13:08.947 [info] Application oauth exited with reason: stopped
01:13:08.947 [info] Application mochiweb exited with reason: stopped
01:13:08.947 [info] Application xmerl exited with reason: stopped
01:13:08.948 [info] Application os_mon exited with reason: stopped
01:13:08.948 [info] Application sasl exited with reason: stopped
01:13:08.948 [info] Application couch_event exited with reason: stopped
01:13:08.948 [info] Application couch_log exited with reason: stopped
01:13:08.948 [info] Application couch_log_lager exited with reason: stopped
01:13:08.948 [info] Application khash exited with reason: stopped
01:13:08.948 [info] Application couch_epi exited with reason: stopped
01:13:08.949 [info] Application config exited with reason: stopped
01:13:08.949 [info] Application ssl exited with reason: stopped
01:13:08.949 [info] Application public_key exited with reason: stopped
01:13:08.949 [info] Application asn1 exited with reason: stopped
01:13:08.949 [info] Application ibrowse exited with reason: stopped
01:13:08.949 [info] Application inets exited with reason: stopped

=INFO REPORT==== 2-Oct-2015::01:13:08 ===
    application: lager
    exited: stopped
    type: temporary

=INFO REPORT==== 2-Oct-2015::01:13:08 ===
    application: goldrush
    exited: stopped
    type: temporary
  [done in 30.198 s]
=======================================================

=INFO REPORT==== 2-Oct-2015::01:13:08 ===
    application: compiler
    exited: stopped
    type: temporary
  Failed: 1.  Skipped: 0.  Passed: 2.

=INFO REPORT==== 2-Oct-2015::01:13:08 ===
    application: syntax_tools
    exited: stopped
    type: temporary
ERROR: One or more eunit tests failed.
ERROR: eunit failed while processing 
/Users/nvatama/asf/couchdb/src/couch_replicator: rebar_abort
{code}

> Replicator client doesn't handle un-expectedly closed pipelined connections 
> well
> --------------------------------------------------------------------------------
>
>                 Key: COUCHDB-2833
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-2833
>             Project: CouchDB
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: Database Core
>            Reporter: Nick Vatamaniuc
>
> This was found investigating the failure of replication tests. Specifically 
> couch_replicator_large_atts_tests, the {local, remote} sub-case.
> The test sets up push replications from local to remote.
> Replication workers  have more than 1 document larger than 
> MAX_BULK_ATT_SIZE=64K.  They start pushing them to the target, using a 
> keep-alive connection (default  for HTTP 1.1), the first few pipelined 
> requests will go through using the same connection, then server will accept 
> the first PUT to …/docid?edits=false, then return Connection:close and close 
> the connection after the 201 Created result.  Workers don't expect that, and 
> try to do another PUT on same connection. And then crash on ibrowser's 
> connection_closing error, which they don't handle. That causes the whole 
> async replication process to exit.
> Potentially there are 2 issues.
> couch_replicator_http layer needs to handle this case better. On closing 
> error, shut down the socket quickly and then retry. (Not shutting it down and 
> retrying means retrying for at least 5 or so seconds until something cleans 
> up that connection state).
> Adding this clause to couch_replicator_httpc.erl seems to do the trick:
> {code}
> process_response({error,connection_closing}, Worker, HttpDb, Params, _Cb)->
>     couch_log:notice("Connection closed by server. Closing the socket and 
> trying again",[]),
>     ibrowse_http_client:stop(Worker),
>     throw({retry, HttpDb, Params});
> {code}
> Another issue is to make the server not close the connection after first PUT 
> to .../db/docid?new_edits=false when using pipeline connections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to