[ 
https://issues.apache.org/jira/browse/COUCHDB-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Eisenmann reopened COUCHDB-536:
-------------------------------------

    Skill Level: Committers Level (Medium to Hard)

All right i got this issue again on one of the nodes in the cluster. The 
software is now CouchDB 1.1.0 with Erlang R14B02. 

After a couple of hours replicating from 3 other nodes and constant changes on 
the local node it stopps accepting HTTP (see error below).

I have checked with netstat and also saw lots of connections using the CouchDB 
port. 

It only happens on one node on the cluster though. I keep monitoring if that 
happens every day. I had a similar issue (replication did hang at some point) 
but thought this to be related to stunnel as there was no trace in the couch. 
Yesterday i have switched to native CouchDB SSL and now there is this trace.

[Fri, 08 Jul 2011 04:06:22 GMT] [error] [<0.10266.14>] {error_report,<0.31.0>,
                                     {<0.10266.14>,std_error,
                                      [{application,mochiweb},
                                       "Accept failed error",
                                       "{error,enfile}"]}}
[Fri, 08 Jul 2011 04:06:22 GMT] [error] [<0.10266.14>] {error_report,<0.31.0>,
                           {<0.10266.14>,crash_report,
                            [[{initial_call,
                                  {mochiweb_acceptor,init,
                                      ['Argument__1','Argument__2',
                                       'Argument__3']}},
                              {pid,<0.10266.14>},
                              {registered_name,[]},
                              {error_info,
                                  {exit,
                                      {error,accept_failed},
                                      [{mochiweb_acceptor,init,3},
                                       {proc_lib,init_p_do_apply,3}]}},
                              {ancestors,
                                  [https,couch_secondary_services,
                                   couch_server_sup,<0.32.0>]},
                              {messages,[]},
                              {links,[<0.136.0>]},
                              {dictionary,[]},
                              {trap_exit,false},
                              {status,running},
                              {heap_size,233},
                              {stack_size,24},
                              {reductions,372}],
                             []]}}


> CouchDB HTTP server stops accepting connections
> -----------------------------------------------
>
>                 Key: COUCHDB-536
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-536
>             Project: CouchDB
>          Issue Type: Bug
>          Components: HTTP Interface
>    Affects Versions: 0.10
>         Environment: Ubuntu Linux 8.04 32bit and 64bit with Erlang R13B01
>            Reporter: Simon Eisenmann
>            Priority: Critical
>
> Having 3 Couches all replicating a couple of databases to each other (pull 
> replication with a update notification process) the HTTP service on any of 
> the Couches stops working at some point (when running for a couple of ours 
> with constant changes on all databases and servers).
> This is the error when a new HTTP request comes in:
> =ERROR REPORT==== 19-Oct-2009::10:18:55 ===
>     application: mochiweb
>     "Accept failed error"
>     "{error,enfile}"
> [error] [<0.21619.12>] {error_report,<0.24.0>,
>     {<0.21619.12>,crash_report,
>      [[{initial_call,{mochiweb_socket_server,acceptor_loop,['Argument__1']}},
>        {pid,<0.21619.12>},
>        {registered_name,[]},
>        {error_info,
>            {exit,
>                {error,accept_failed},
>                [{mochiweb_socket_server,acceptor_loop,1},
>                 {proc_lib,init_p_do_apply,3}]}},
>        {ancestors,
>            [couch_httpd,couch_secondary_services,couch_server_sup,<0.1.0>]},
>        {messages,[]},
>        {links,[<0.66.0>]},
>        {dictionary,[]},
>        {trap_exit,false},
>        {status,running},
>        {heap_size,233},
>        {stack_size,24},
>        {reductions,202}],
>       []]}}
> [error] [<0.66.0>] {error_report,<0.24.0>,
>     {<0.66.0>,std_error,
>      {mochiweb_socket_server,225,{acceptor_error,{error,accept_failed}}}}}
> To me this seems like it runs out of threads or sockets to handle the new 
> connection or somewhat like this.
> Also i see in this setup that if i put lots of changes in a short time at 
> some point the replication process hangs (never finishes) and when trying to 
> restart the same replication once again is not possible and resulting in a 
> timeout.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to