Turns out that rexi_server's can die in such a way that they're not restarted. 
This can (and has!) left a cluster without the ability to issue RPC calls 
effectively rendering the cluster useless.

A slightly redacted log showing it happen due to hitting the process limit is:

2018-08-18T21:00:05.106860Z db3.clustername <0.19934.2> -  gen_server 
'[email protected]' terminated with reason: 
system_limit at erlang:spawn_opt/1 <= erlang:spawn_monitor/3 <= 
rexi_server:handle_cast/2(line:71) <= gen_server:try_dispatch/4(line:593) <= 
gen_server:handle_msg/5(line:659) <= proc_lib:init_p_do_apply/3(line:237)#012 
state: {st,6946959,7078032,{[],[]},0,0}


[ Full content available at: https://github.com/apache/couchdb/issues/1571 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to