Re: [PATCH] MEDIUM: global: add a 'hard-stop-after' option to cap the soft-stop time

Cyril Bonté Thu, 23 Mar 2017 13:24:01 -0700

Hi Willy,

Le 23/03/2017 à 07:42, Willy Tarreau a écrit :

Hi Cyril,


I have a few comments below :

diff --git a/src/cfgparse.c b/src/cfgparse.c
index 2eb25edb..9681e06b 100644
--- a/src/cfgparse.c
+++ b/src/cfgparse.c
@@ -1011,6 +1011,23 @@ int cfg_parse_global(const char *file, int linenum, char 
**args, int kwm)
                }
        }
        /* end of user/group name handling*/
+       else if (strcmp(args[0], "hard-stop-after") == 0) {
+               const char *res;
+
+               if (!*args[1]) {
+                       Alert("parsing [%s:%d] : '%s' expects <time> as 
argument.\n",
+                               file, linenum, args[0]);
+                       err_code |= ERR_ALERT | ERR_FATAL;
+                       goto out;
+               }
+               res = parse_time_err(args[1], &global.hard_stop_after, 
TIME_UNIT_MS);
+               if (res) {
+                       Alert("parsing [%s:%d]: unexpected character '%c' in argument to 
<%s>.\n",
+                               file, linenum, *res, args[0]);
+                       err_code |= ERR_ALERT | ERR_FATAL;
+                       goto out;
+               }
+       }


You can place this directly as a keyword parser in proxy.c. For this
you can simply add the following line to cfg_kws :

   { CFG_GLOBAL, "hard-stop-after", proxy_parse_hard_stop_after },

Right, I missed that one for global keywords. It's ready in a new patchversion.

+struct task *hard_stop(struct task *t)
+{
+       struct proxy *p;
+       struct stream *s;
+
+       Warning("soft-stop running for too long, performing a hard-stop.\n");
+       send_log(NULL, LOG_WARNING, "soft-stop running for too long, performing a 
hard-stop.\n");
+       p = proxy;
+       while (p) {
+               if (!(p->cap & PR_CAP_FE))
+                       continue;
+
+               list_for_each_entry(s, &streams, list) {
+                       stream_shutdown(s, SF_ERR_KILLED);
+               }
+               p = p->next;
+       }


You don't use "p" in this loop, I think you started using it initially
and now you don't need it anymore, so in fact when you visit the first
proxy you shut down all streams and then nothing is done when looping
over the remaining ones. So all the streams are visited once per
frontend.

Oops, I copied a portion of code from an older patch supposed to closethe HTTP keep-alived connections, but I didn't clean everything.

That said, I'm not convinced by the benefit of killing all streams like
this given that at the moment it does nothing, it only wakes up their
task and immediately dies a few lines later in exit() without giving
them any chance to run. However I admit that if we later want to give a
chance to these tasks to complete (eg: by waiting one extra second and
coming back here again to exit for real) then it could be useful to do
it.

I'd be tempted to suggest that we set a flag and re-arm the timer to
come back here to exit so that tasks can stop cleanly, this would give
an opportunity to get a log for each of them, but on the other hand
almost all the tasks being killed late will be purely TCP and most of
the time users don't log pure TCP connections, so the benefit becomes
very low. Otherwise we can simply exit and not even kill them.

Hmmm just thinking, in fact there could be a benefit in letting them
run : ensuring that we close outgoing connections with an RST to avoid
accumulating TIME_WAITs.

Yes I was not convinced by the benefit too, but I prefered to close themexactly to avoid TIME_WAITs.

What do you think about trying something like this (not even compile-tested) :

        static int killed;
        struct stream *s;

        if (killed) {
                Warning("Some tasks resisted to hard-stop, exiting now.\n");
                send_log(NULL, LOG_WARNING, "Some tasks resisted to hard-stop, 
exiting now.\n");
                /* Do some cleanup and explicitely quit */
                deinit();
                exit(0);
        }

        Warning("soft-stop running for too long, performing a hard-stop.\n");
        send_log(NULL, LOG_WARNING, "soft-stop running for too long, performing a 
hard-stop.\n");

        /* kill the tasks and give them one second to terminate cleanly */
        list_for_each_entry(s, &streams, list) {
                stream_shutdown(s, SF_ERR_KILLED);
        }

        killed = 1;
        t->expire = tick_add(now_ms, MS_TO_TICKS(1000));
        return t;

Normally if all tasks properly die, we'll automatically break out of
run_poll_loops(). If we have to come back here, it means we didn't
quit so it's just a safety belt to exit here.


OK I see, I've put the logic in the next patch version.

Additionally, since it helps improving the reliability of the service
during reloads without adding particular features and the patch is
reasonably small and isolated, I was thinking we could backport it to
1.7 and even possibly 1.6. What do you and others think ?

+1 for backporting it in 1.7 and 1.6, I'll quickly check if it applieswithout any change.




--
Cyril Bonté

Re: [PATCH] MEDIUM: global: add a 'hard-stop-after' option to cap the soft-stop time

Reply via email to