We hope that we are missing something and that there is a solution to our problem that we just do not see yet. Please bear with us when this is the case.
In the following problem description we currently come to the conclusion that the configuration parameter *graceful_timeout* is not usable for us. The current behaviour of hypnotoad never satisfies all of our requirements. The problem is relatively new, we traced it back to v5.10. Falsification and corrections very welcome. Our server is a completely synchronous prefork hypnotoad farm with several hundred workers. Our accepts parameter is 1000. We cannot go higher because we have memory leaks. Our heartbeat_timeout is 100000. We cannot go lower, we must satisfy slow and long running requests. And now we try to determine a good value for graceful_timeout. As far as we can see, *graceful_timeout* is used in four different situations: (1) When a *heartbeat_timeout* is reached, the manager process sends a SIGQUIT to the worker and starting from that, after *graceful_timeout* sends a SIGKILL. Determining a good value for *graceful_timeout* in this context is dependent on the time the server needs when it turns out that a request cannot be finished within *heartbeat_timeout*. Our need here would be around 10 seconds for cleanup. (2) When a graceful server shutdown is triggered by some human intervention, the manager process sends a SIGQUIT to all running workers and after that, after *graceful_timeout* sends a SIGKILL to each of them. Note that the human who triggers the graceful shutdown may have to wait that long until all processes have finished or have been killed. Determining a good value for *graceful_timeout* in this context is very similar to (1). (3) When *accepts* has been reached for a worker, the worker process lets the manager process know about that in its heartbeat (since v5.10); then the manager process sends a SIGQUIT to the old process and after *graceful_timeout* sends a SIGKILL. Determining a good value for *graceful_timeout* in this context is the same as for *heartbeat_timeout*, i.e. 100000 seconds. (4) When a graceful server restart is triggered by some human intervention, the manager process sends a SIGQUIT to all running workers and after that, after *graceful_timeout* sends a SIGKILL to each of them. Determining a good value for *graceful_timeout* in this context is very similar to (3). To sum up: we have two situations (1 and 2) during which we want to set graceful_timeout to 10 seconds and we have two situations (3 and 4) where we need to set it to 100000 seconds. Is there a way out? If we choose 10, then we have too many broken connections. When we set it to 100000, then we frustrate our DevOps team. Please advise. -- You received this message because you are subscribed to the Google Groups "Mojolicious" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/mojolicious. For more options, visit https://groups.google.com/d/optout.
