[ 
https://issues.apache.org/jira/browse/COUCHDB-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790760#action_12790760
 ] 

Robert Newson commented on COUCHDB-597:
---------------------------------------

Replication tasks are failing even if executed serially as long as databases 
are large enough (1.3 gb in this case). The fourth replication task has crashed.

Stack traces from the end of my log while a replication tasks is hung/crashed;

Tue, 15 Dec 2009 07:08:44 GMT] [error] [<0.49.0>] ** Generic server 
couch_task_status terminating 
** Last message in was {#Ref<0.0.1832.61391>,3}
** When Server state == nil
** Reason for termination == 
** {function_clause,
       [{couch_task_status,handle_info,[{#Ref<0.0.1832.61391>,3},nil]},
        {gen_server,handle_msg,5},
        {proc_lib,init_p_do_apply,3}]}

Tue, 15 Dec 2009 07:08:44 GMT] [error] [<0.45.0>] {error_report,<0.23.0>,
    {<0.45.0>,supervisor_report,
     [{supervisor,{local,couch_primary_services}},
      {errorContext,child_terminated},
      {reason,
          {function_clause,
              [{couch_task_status,handle_info,[{#Ref<0.0.1832.61391>,3},nil]},
               {gen_server,handle_msg,5},
               {proc_lib,init_p_do_apply,3}]}},
      {offender,
          [{pid,<0.49.0>},
           {name,couch_task_status},
           {mfa,{couch_task_status,start_link,[]}},
           {restart_type,permanent},
           {shutdown,brutal_kill},
           {child_type,worker}]}]}}

[Tue, 15 Dec 2009 07:08:51 GMT] [error] [<0.2720.204>] {error_report,<0.23.0>,
              {<0.2720.204>,crash_report,
               [[{initial_call,{couch_task_status,init,['Argument__1']}},
                 {pid,<0.2720.204>},
                 {registered_name,couch_task_status},
                 {error_info,{exit,{{badmatch,[]},
                                    [{couch_task_status,handle_cast,2},
                                     {gen_server,handle_msg,5},
                                     {proc_lib,init_p_do_apply,3}]},
                                   [{gen_server,terminate,6},
                                    {proc_lib,init_p_do_apply,3}]}},
                 {ancestors,[couch_primary_services,couch_server_sup,<0.1.0>]},
                 {messages,[]},
                 {links,[<0.45.0>]},
                 {dictionary,[]},
                 {trap_exit,false},
                 {status,running},
                 {heap_size,377},
                 {stack_size,24},
                 {reductions,127}],
                []]}}

[Tue, 15 Dec 2009 07:08:51 GMT] [error] [<0.45.0>] {error_report,<0.23.0>,
              {<0.45.0>,supervisor_report,
               [{supervisor,{local,couch_primary_services}},
                {errorContext,child_terminated},
                {reason,{{badmatch,[]},
                         [{couch_task_status,handle_cast,2},
                          {gen_server,handle_msg,5},
                          {proc_lib,init_p_do_apply,3}]}},
                {offender,[{pid,<0.2720.204>},
                           {name,couch_task_status},
                           {mfa,{couch_task_status,start_link,[]}},
                           {restart_type,permanent},
                           {shutdown,brutal_kill},
                           {child_type,worker}]}]}}

[Tue, 15 Dec 2009 07:08:57 GMT] [error] [<0.4889.204>] ** Generic server 
couch_task_status terminating 
** Last message in was {'$gen_cast',
                           {update_status,<0.9558.169>,
                               <<"Copied 146001 of 271595 changes (53%)">>}}
** When Server state == nil
** Reason for termination == 
** {{badmatch,[]},
    [{couch_task_status,handle_cast,2},
     {gen_server,handle_msg,5},
     {proc_lib,init_p_do_apply,3}]}


[Tue, 15 Dec 2009 07:08:57 GMT] [error] [<0.4889.204>] {error_report,<0.23.0>,
              {<0.4889.204>,crash_report,
               [[{initial_call,{couch_task_status,init,['Argument__1']}},
                 {pid,<0.4889.204>},
                 {registered_name,couch_task_status},
                 {error_info,{exit,{{badmatch,[]},
                                    [{couch_task_status,handle_cast,2},
                                     {gen_server,handle_msg,5},
                                     {proc_lib,init_p_do_apply,3}]},
                                   [{gen_server,terminate,6},
                                    {proc_lib,init_p_do_apply,3}]}},
                 {ancestors,[couch_primary_services,couch_server_sup,<0.1.0>]},
                 {messages,[]},
                 {links,[<0.45.0>]},
                 {dictionary,[]},
                 {trap_exit,false},
                 {status,running},
                 {heap_size,377},
                 {stack_size,24},
                 {reductions,127}],
                []]}}

[Tue, 15 Dec 2009 07:08:57 GMT] [error] [<0.45.0>] {error_report,<0.23.0>,
              {<0.45.0>,supervisor_report,
               [{supervisor,{local,couch_primary_services}},
                {errorContext,child_terminated},
                {reason,{{badmatch,[]},
                         [{couch_task_status,handle_cast,2},
                          {gen_server,handle_msg,5},
                          {proc_lib,init_p_do_apply,3}]}},
                {offender,[{pid,<0.4889.204>},
                           {name,couch_task_status},
                           {mfa,{couch_task_status,start_link,[]}},
                           {restart_type,permanent},
                           {shutdown,brutal_kill},
                           {child_type,worker}]}]}}

[Tue, 15 Dec 2009 07:09:02 GMT] [error] [<0.45.0>] {error_report,<0.23.0>,
              {<0.45.0>,supervisor_report,
               [{supervisor,{local,couch_primary_services}},
                {errorContext,shutdown},
                {reason,reached_max_restart_intensity},
                {offender,[{pid,<0.6117.204>},
                           {name,couch_task_status},
                           {mfa,{couch_task_status,start_link,[]}},
                           {restart_type,permanent},
                           {shutdown,brutal_kill},
                           {child_type,worker}]}]}}

[Tue, 15 Dec 2009 07:09:02 GMT] [error] [<0.60.0>] Exit on non-updater process: 
killed

[Tue, 15 Dec 2009 07:09:02 GMT] [error] [<0.60.0>] ** Generic server couch_view 
terminating 
** Last message in was {'EXIT',<0.61.0>,killed}
** When Server state == {server,"/var/lib/couchdb/0.10.0"}
** Reason for termination == 
** killed


[Tue, 15 Dec 2009 07:09:02 GMT] [error] [<0.60.0>] {error_report,<0.23.0>,
              {<0.60.0>,crash_report,
               [[{initial_call,{couch_view,init,['Argument__1']}},
                 {pid,<0.60.0>},
                 {registered_name,couch_view},
                 {error_info,{exit,killed,
                                   [{gen_server,terminate,6},
                                    {proc_lib,init_p_do_apply,3}]}},
                 {ancestors,[couch_secondary_services,couch_server_sup,
                             <0.1.0>]},
                 {messages,[]},
                 {links,[<0.52.0>]},
                 {dictionary,[]},
                 {trap_exit,true},
                 {status,running},
                 {heap_size,2584},
                 {stack_size,24},
                 {reductions,5320}],
                []]}}

[Tue, 15 Dec 2009 07:09:02 GMT] [error] [<0.52.0>] {error_report,<0.23.0>,
              {<0.52.0>,supervisor_report,
               [{supervisor,{local,couch_secondary_services}},
                {errorContext,child_terminated},
                {reason,killed},
                {offender,[{pid,<0.60.0>},
                           {name,view_manager},
                           {mfa,{couch_view,start_link,[]}},
                           {restart_type,permanent},
                           {shutdown,brutal_kill},
                           {child_type,worker}]}]}}

[Tue, 15 Dec 2009 07:08:44 GMT] [error] [<0.49.0>] {error_report,<0.23.0>,
    {<0.49.0>,crash_report,
     [[{initial_call,{couch_task_status,init,['Argument__1']}},
       {pid,<0.49.0>},
       {registered_name,couch_task_status},
       {error_info,
           {exit,
               {function_clause,
                   [{couch_task_status,handle_info,
                        [{#Ref<0.0.1832.61391>,3},nil]},
                    {gen_server,handle_msg,5},
                    {proc_lib,init_p_do_apply,3}]},
               [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}},
       {ancestors,[couch_primary_services,couch_server_sup,<0.1.0>]},
       {messages,[]},
       {links,[<0.45.0>]},
       {dictionary,[]},
       {trap_exit,false},
       {status,running},
       {heap_size,2584},
       {stack_size,24},
       {reductions,191624}],
      []]}}



> Replication tasks crash.
> ------------------------
>
>                 Key: COUCHDB-597
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-597
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.11
>            Reporter: Robert Newson
>
> If I kick off 10 replication tasks in quick succession, occasionally one or 
> two of the replication tasks will die and not be resumed. It seems that the 
> stat tracking is a little buggy, and under stress can eventually cause a 
> permanent failure of the supervised replication task;
> [Fri, 11 Dec 2009 19:00:08 GMT] [error] [<0.80.0>] {error_report,<0.30.0>,
>     {<0.80.0>,supervisor_report,
>      [{supervisor,{local,couch_rep_sup}},
>       {errorContext,shutdown_error},
>       {reason,killed},
>       {offender,
>           [{pid,<0.6700.11>},
>            {name,"fcbb13200a1618cf983b347f4d2c9835+create_target"},
>            {mfa,
>                {gen_server,start_link,
>                    [couch_rep,
>                     ["fcbb13200a1618cf983b347f4d2c9835",
>                      {[{<<"create_target">>,true},
>                        {<<"source">>,<<"http://node:5984/perf-p2";>>},
>                        {<<"target">>,<<"perf-p2">>}]},
>                      {user_ctx,null,[<<"_admin">>]}],
>                     []]}},
>            {restart_type,temporary},
>            {shutdown,1},
>            {child_type,worker}]}]}}
> [Fri, 11 Dec 2009 19:00:08 GMT] [error] [emulator] Error in process 
> <0.6705.11> with exit value: 
> {badarg,[{ets,insert,[stats_hit_table,{{couchdb,open_os_files},-1}]},{couch_stats_collector,decrement,1}]}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to