style95 opened a new issue #1071: [Question] CouchDB crash during benchmark URL: https://github.com/apache/couchdb/issues/1071 <!--- Provide a general summary of the issue in the Title above --> CouchDB crashed during benchmarking. I deployed 3 nodes. * node1: 10.113.130.91 * node2: 10.113.130.92 * node3: 10.113.130.93 I sent 500 docs using bulk-insert API. It showed steady performance, and after about 4 minutes, suddenly one of nodes crashed.  I got following logs on the nodes. **node1** ``` [error] 2017-12-18T03:13:09.183711Z [email protected] emulator -------- Error in process <0.21270.8> on node '[email protected]' with exit value: {{rexi_DOWN,{'[email protected]',noconnect}},[{mem3_rpc,rexi_call,2,[{file,"src/mem3_rpc.erl"},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,"src/mem3_rep.erl"},{line,194}]},{mem3_rep,repl,2,[{file,"src/mem3_rep.erl"},... [error] 2017-12-18T03:13:09.183768Z [email protected] emulator -------- Error in process <0.21304.8> on node '[email protected]' with exit value: {{rexi_DOWN,{'[email protected]',noconnect}},[{mem3_rpc,rexi_call,2,[{file,"src/mem3_rpc.erl"},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,"src/mem3_rep.erl"},{line,194}]},{mem3_rep,repl,2,[{file,"src/mem3_rep.erl"},... [warning] 2017-12-18T03:13:09.183766Z [email protected] <0.294.0> -------- mem3_sync shards/00000000-1fffffff/lambda-bmt_activations.1513565327 [email protected] {{rexi_DOWN,{'[email protected]',noconnect}},[{mem3_rpc,rexi_call,2,[{file,[115,114,99,47,109,101,109,51,95,114,112,99,46,101,114,108]},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,194}]},{mem3_rep,repl,2,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,175}]},{mem3_rep,go,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,81}]},{mem3_sync,'-start_push_replication/1-fun-0-',2,[{file,[115,114,99,47,109,101,109,51,95,115,121,110,99,46,101,114,108]},{line,208}]}]} [error] 2017-12-18T03:13:09.183804Z [email protected] emulator -------- Error in process <0.21269.8> on node '[email protected]' with exit value: {{rexi_DOWN,{'[email protected]',noconnect}},[{mem3_rpc,rexi_call,2,[{file,"src/mem3_rpc.erl"},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,"src/mem3_rep.erl"},{line,194}]},{mem3_rep,repl,2,[{file,"src/mem3_rep.erl"},... [warning] 2017-12-18T03:13:09.183981Z [email protected] <0.294.0> -------- mem3_sync shards/00000000-1fffffff/_global_changes.1513565315 [email protected] {{rexi_DOWN,{'[email protected]',noconnect}},[{mem3_rpc,rexi_call,2,[{file,[115,114,99,47,109,101,109,51,95,114,112,99,46,101,114,108]},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,194}]},{mem3_rep,repl,2,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,175}]},{mem3_rep,go,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,81}]},{mem3_sync,'-start_push_replication/1-fun-0-',2,[{file,[115,114,99,47,109,101,109,51,95,115,121,110,99,46,101,114,108]},{line,208}]}]} [warning] 2017-12-18T03:13:09.184206Z [email protected] <0.294.0> -------- mem3_sync shards/60000000-7fffffff/lambda-bmt_activations.1513565327 [email protected] {{rexi_DOWN,{'[email protected]',noconnect}},[{mem3_rpc,rexi_call,2,[{file,[115,114,99,47,109,101,109,51,95,114,112,99,46,101,114,108]},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,194}]},{mem3_rep,repl,2,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,175}]},{mem3_rep,go,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,81}]},{mem3_sync,'-start_push_replication/1-fun-0-',2,[{file,[115,114,99,47,109,101,109,51,95,115,121,110,99,46,101,114,108]},{line,208}]}]} [error] 2017-12-18T03:13:09.186470Z [email protected] emulator -------- Error in process <0.21343.8> on node '[email protected]' with exit value: {{rexi_DOWN,{'[email protected]',noconnect}},[{mem3_rpc,rexi_call,2,[{file,"src/mem3_rpc.erl"},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,"src/mem3_rep.erl"},{line,194}]},{mem3_rep,repl,2,[{file,"src/mem3_rep.erl"},... [error] 2017-12-18T03:13:09.186529Z [email protected] emulator -------- Error in process <0.21372.8> on node '[email protected]' with exit value: {{rexi_DOWN,{'[email protected]',noconnect}},[{mem3_rpc,rexi_call,2,[{file,"src/mem3_rpc.erl"},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,"src/mem3_rep.erl"},{line,194}]},{mem3_rep,repl,2,[{file,"src/mem3_rep.erl"},... [warning] 2017-12-18T03:13:09.186662Z [email protected] <0.294.0> -------- mem3_sync shards/80000000-9fffffff/lambda-bmt_activations.1513565327 [email protected] {{rexi_DOWN,{'[email protected]',noconnect}},[{mem3_rpc,rexi_call,2,[{file,[115,114,99,47,109,101,109,51,95,114,112,99,46,101,114,108]},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,194}]},{mem3_rep,repl,2,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,175}]},{mem3_rep,go,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,81}]},{mem3_sync,'-start_push_replication/1-fun-0-',2,[{file,[115,114,99,47,109,101,109,51,95,115,121,110,99,46,101,114,108]},{line,208}]}]} [warning] 2017-12-18T03:13:09.186962Z [email protected] <0.294.0> -------- mem3_sync shards/c0000000-dfffffff/lambda-bmt_activations.1513565327 [email protected] {{rexi_DOWN,{'[email protected]',noconnect}},[{mem3_rpc,rexi_call,2,[{file,[115,114,99,47,109,101,109,51,95,114,112,99,46,101,114,108]},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,194}]},{mem3_rep,repl,2,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,175}]},{mem3_rep,go,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,81}]},{mem3_sync,'-start_push_replication/1-fun-0-',2,[{file,[115,114,99,47,109,101,109,51,95,115,121,110,99,46,101,114,108]},{line,208}]}]} [error] 2017-12-18T03:13:09.188375Z [email protected] emulator -------- Error in process <0.21432.8> on node '[email protected]' with exit value: {{rexi_DOWN,{'[email protected]',noconnect}},[{mem3_rpc,rexi_call,2,[{file,"src/mem3_rpc.erl"},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,"src/mem3_rep.erl"},{line,194}]},{mem3_rep,repl,2,[{file,"src/mem3_rep.erl"},... [error] 2017-12-18T03:13:09.188415Z [email protected] emulator -------- Error in process <0.21424.8> on node '[email protected]' with exit value: {{rexi_DOWN,{'[email protected]',noconnect}},[{mem3_rpc,rexi_call,2,[{file,"src/mem3_rpc.erl"},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,"src/mem3_rep.erl"},{line,194}]},{mem3_rep,repl,2,[{file,"src/mem3_rep.erl"},... [warning] 2017-12-18T03:13:09.188476Z [email protected] <0.294.0> -------- mem3_sync shards/e0000000-ffffffff/lambda-bmt_activations.1513565327 [email protected] {{rexi_DOWN,{'[email protected]',noconnect}},[{mem3_rpc,rexi_call,2,[{file,[115,114,99,47,109,101,109,51,95,114,112,99,46,101,114,108]},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,194}]},{mem3_rep,repl,2,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,175}]},{mem3_rep,go,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,81}]},{mem3_sync,'-start_push_replication/1-fun-0-',2,[{file,[115,114,99,47,109,101,109,51,95,115,121,110,99,46,101,114,108]},{line,208}]}]} [warning] 2017-12-18T03:13:09.188649Z [email protected] <0.294.0> -------- mem3_sync shards/20000000-3fffffff/lambda-bmt_activations.1513565327 [email protected] {{rexi_DOWN,{'[email protected]',noconnect}},[{mem3_rpc,rexi_call,2,[{file,[115,114,99,47,109,101,109,51,95,114,112,99,46,101,114,108]},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,194}]},{mem3_rep,repl,2,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,175}]},{mem3_rep,go,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,81}]},{mem3_sync,'-start_push_replication/1-fun-0-',2,[{file,[115,114,99,47,109,101,109,51,95,115,121,110,99,46,101,114,108]},{line,208}]}]} [error] 2017-12-18T03:13:09.190738Z [email protected] emulator -------- Error in process <0.21483.8> on node '[email protected]' with exit value: {{rexi_DOWN,{'[email protected]',noconnect}},[{mem3_rpc,rexi_call,2,[{file,"src/mem3_rpc.erl"},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,"src/mem3_rep.erl"},{line,194}]},{mem3_rep,repl,2,[{file,"src/mem3_rep.erl"},... [error] 2017-12-18T03:13:09.190787Z [email protected] emulator -------- Error in process <0.21489.8> on node '[email protected]' with exit value: {{rexi_DOWN,{'[email protected]',noconnect}},[{mem3_rpc,rexi_call,2,[{file,"src/mem3_rpc.erl"},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,"src/mem3_rep.erl"},{line,194}]},{mem3_rep,repl,2,[{file,"src/mem3_rep.erl"},... [warning] 2017-12-18T03:13:09.190856Z [email protected] <0.294.0> -------- mem3_sync shards/40000000-5fffffff/lambda-bmt_activations.1513565327 [email protected] {{rexi_DOWN,{'[email protected]',noconnect}},[{mem3_rpc,rexi_call,2,[{file,[115,114,99,47,109,101,109,51,95,114,112,99,46,101,114,108]},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,194}]},{mem3_rep,repl,2,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,175}]},{mem3_rep,go,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,81}]},{mem3_sync,'-start_push_replication/1-fun-0-',2,[{file,[115,114,99,47,109,101,109,51,95,115,121,110,99,46,101,114,108]},{line,208}]}]} [warning] 2017-12-18T03:13:09.191076Z [email protected] <0.294.0> -------- mem3_sync shards/a0000000-bfffffff/lambda-bmt_activations.1513565327 [email protected] {{rexi_DOWN,{'[email protected]',noconnect}},[{mem3_rpc,rexi_call,2,[{file,[115,114,99,47,109,101,109,51,95,114,112,99,46,101,114,108]},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,194}]},{mem3_rep,repl,2,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,175}]},{mem3_rep,go,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,81}]},{mem3_sync,'-start_push_replication/1-fun-0-',2,[{file,[115,114,99,47,109,101,109,51,95,115,121,110,99,46,101,114,108]},{line,208}]}]} ``` node2 crashed. But I am not sure what the problem is. **node2** ``` [error] 2017-12-18T03:12:54.538612Z [email protected] <0.1329.0> -------- OS Process died with status: 137 [error] 2017-12-18T03:12:54.549992Z [email protected] <0.1329.0> -------- gen_server <0.1329.0> terminated with reason: {exit_status,137} last msg: {#Port<0.8125>,{exit_status,137}} state: {os_proc,"./bin/couchjs ./share/server/main.js",#Port<0.8125>,#Fun<couch_os_process.writejson.2>,#Fun<couch_os_process.readjson.1>,5000,300000} [info] 2017-12-18T03:12:54.551745Z [email protected] <0.213.0> -------- couch_proc_manager <0.1329.0> died {exit_status,137} [error] 2017-12-18T03:12:54.552261Z [email protected] <0.1329.0> -------- CRASH REPORT Process (<0.1329.0>) with 0 neighbors exited with reason: {exit_status,137} at gen_server:terminate/6(line:737) <= proc_lib:init_p_do_apply/3(line:237); initial_call: {couch_os_process,init,['Argument__1']}, ancestors: [<0.1328.0>], messages: [], links: [<0.213.0>], dictionary: [], trap_exit: false, status: running, heap_size: 610, stack_size: 27, reductions: 2890 ``` And I could occasionally observed following error as well. I wonder why one of my nodes is suddenly disallowed. ``` [error] 2017-12-18T03:13:06.373063Z [email protected] <0.16717.8> -------- ** Connection attempt from disallowed node '[email protected]' ** ``` Though there are high loads, I expect couchdb reject requests rather than crashed. But I don't know how to configure it. I used default configuration with following changes. **default.ini** ``` max_dbs_open = 1024 os_process_limit = 2000 os_process_soft_limit = 1500 check_interval = 60 _default = [{db_fragmentation, "30%"}, {view_fragmentation, "30%"}] ``` **vm.args** ``` +A 1024 ``` I already looked into couchDB guide and I could know what the configurations are, but it was not easy for me to figure out what is the proper value for each configurations. Could anyone guide me to tune it for production? or are there any guide to tune CouchDB? It would be great to have some recommendation such as `os_process_limit = CPU cores * 10`, `+A = CPU cores * 15` and so on. I am using following machines: * CPU: 32 cores * MEM: 64 GB * DISK: 1.3 TB SSD * Network: 1G
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
