wohali opened a new issue #1383: couch_compaction_daemon dies on busy node
URL: https://github.com/apache/couchdb/issues/1383
 
 
   # Current Behaviour
   
   When couch_compaction_daemon fails to spawn a compactor with `start_compact` 
and receives a timeout from `gen_server:call/2`, the entire compaction daemon 
fails and restarts.
   
   Sample log excerpt:
   ```
   [error] 2018-05-29T12:41:51.965000Z [email protected] <0.4859.674> -------- 
gen_server couch_compaction_daemon terminated with reason: 
{compaction_loop_died,{timeout,{gen_server,call,[<0.21919.5399>,start_compact]}}}
     last msg: 
{'EXIT',<0.23803.535>,{timeout,{gen_server,call,[<0.21919.5399>,start_compact]}}}
        state: 
{state,<0.23803.535>,[<<"shards/c0000000-dfffffff/bb.1519652899">>]}
   [error] 2018-05-29T12:41:51.965000Z [email protected] <0.4859.674> -------- 
CRASH REPORT Process couch_compaction_daemon (<0.4859.674>) with 0 neighbors 
exited with reason: 
{compaction_loop_died,{timeout,{gen_server,call,[<0.21919.5399>,start_compact]}}}
 at gen_server:terminate/7(line:826) <= proc_lib:init_p_do_apply/3(line:240); 
initial_call: {couch_compaction_daemon,init,['Argument__1']}, ancestors: 
[couch_secondary_services,couch_sup,<0.209.0>], messages: [], links: 
[<0.17698.21>], dictionary: [], trap_exit: true, status: running, heap_size: 
987, stack_size: 27, reductions: 2899
   [error] 2018-05-29T12:41:51.965000Z [email protected] <0.17698.21> -------- 
Supervisor couch_secondary_services had child compaction_daemon started with 
couch_compaction_daemon:start_link() at <0.4859.674> exit with reason 
{compaction_loop_died,{timeout,{gen_server,call,[<0.21919.5399>,start_compact]}}}
 in context child_terminated
   ```
   
   If the machine is especially busy, this can lead to restart throttling:
   
   ```
   [error] 2018-05-29T12:45:56.635000Z [email protected] <0.17698.21> -------- 
Supervisor couch_secondary_services had child compaction_daemon started with 
couch_compaction_daemon:start_link() at <0.14978.429> exit with reason 
{compaction_loop_died,{timeout,{gen_server,call,[<0.21919.5399>,start_compact]}}}
 in context child_terminated
   [error] 2018-05-29T12:45:56.635000Z [email protected] <0.17698.21> -------- 
Supervisor couch_secondary_services had child compaction_daemon started with 
couch_compaction_daemon:start_link() at <0.14978.429> exit with reason 
reached_max_restart_intensity in context shutdown
   ```
   
   # Expected Behaviour
   The compaction daemon should handle timeouts gracefully. Ideally this would 
start a sleep cycle before trying to start another compaction process.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to