This is an automated email from the ASF dual-hosted git repository.

vatamane pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/couchdb.git


The following commit(s) were added to refs/heads/main by this push:
     new 96e8b9c10 Don't wait indefinitely for replication jobs to stop
96e8b9c10 is described below

commit 96e8b9c10f3ad7bc927a88c96e0222f7b5813e3f
Author: Nick Vatamaniuc <[email protected]>
AuthorDate: Thu Jul 3 13:41:36 2025 -0400

    Don't wait indefinitely for replication jobs to stop
    
    Previously we used `gen_server:stop/3` with an infinity timeout.
    
    We have observed that it's possible for jobs to be stuck waiting for network
    requests so they may take indefinitely to process the shutdown request (and
    call their `terminate/2` callback) and that can block the replicator 
scheduler.
    To fix it add a 5 second timeout to the stop call and then forcibly kill the
    process.
---
 src/couch_replicator/src/couch_replicator_scheduler_job.erl | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/couch_replicator/src/couch_replicator_scheduler_job.erl 
b/src/couch_replicator/src/couch_replicator_scheduler_job.erl
index 544c5602a..7f123441f 100644
--- a/src/couch_replicator/src/couch_replicator_scheduler_job.erl
+++ b/src/couch_replicator/src/couch_replicator_scheduler_job.erl
@@ -47,6 +47,7 @@
 -define(LOWEST_SEQ, 0).
 -define(DEFAULT_CHECKPOINT_INTERVAL, 30000).
 -define(STARTUP_JITTER_DEFAULT, 5000).
+-define(STOP_TIMEOUT_MSEC, 5000).
 
 -record(rep_state, {
     rep_details,
@@ -110,7 +111,8 @@ stop(Pid) when is_pid(Pid) ->
     % won't return ok but exit the calling process, usually the scheduler, so
     % we guard against that. See:
     %  www.erlang.org/doc/apps/stdlib/gen_server.html#stop/3
-    catch gen_server:stop(Pid, shutdown, infinity),
+    catch gen_server:stop(Pid, shutdown, ?STOP_TIMEOUT_MSEC),
+    exit(Pid, kill),
     receive
         {'DOWN', Ref, _, _, Reason} -> Reason
     end,

Reply via email to