Hi Rowan,

I'd be likely to treat it similarly to
> the Linux OOM killer: if it was ever actually invoked, I would be
> investigating how to fix my application.
>

I think this is mostly not about application-level, rather than
infrastructure-level issues, at least
according to our use-cases. For example, if your application is under heavy
DOS attack, or there are
other kind of network connection problems, then your service(s) may
experience slow database/cache/API
response times. It is also possible that a 3rd party API you depend on
faces such issues. All these scenarios
could severely harm the availability of your application, unless you have a
hard, wall-clock time based
timeout as a way to short-circuit too slow responses. So we are not talking
about application (design) issues
only, like the n+1 query problem.

If the remote system responded very slowly, that loop might take many
> times the expected duration. If we simply kill the process when a fixed
> wall time has elapsed, we're very likely to create an order on the
> remote system, but exit without saving its reference. It is however easy
> to see where in that loop we could safely call a routine like
> throw_exception_if_time_limit_reached().


In my opinion, if the proposed ini setting causes consistency issues for an
application, then they are already
vulnerable to other factors which can make their application halt execution
at random places: fatal errors, power outages,
etc. I think developers of distributed systems should be aware of this -
and I think they usually are, let's just take the CAP theorem -,
so they have to accept and consider these risks. Please also note that
"max_execution_time" already measures wall-time on a
few platforms, so we already have precedence for the proposed behavior.


> If rather than placing orders, the loop was just gathering search
> results, killing the process would be less dangerous, but cleanly
> exiting would still be preferable, because we could return a warning and
> partial results, rather than a 500 error.
>

If returning a 50x response with a custom message is a requirement, then
sure, developers can just ignore the new ini setting.
Although, i.e. apache and nginx already allow custom error pages, and I
think that should just be good enough for most use-cases.

* If the database locks up inside a transaction, killing the process
> probably won't roll that transaction back cleanly
>

Since there can be many other causes of a killed process, I think this
particular problem is unrelated to my proposal, and if
such a thing happens, then it's a bug in the database server. Also, please
be aware that the timeout is a clean shutdown
mechanism, so shutdown handlers and the already
mentioned RSHUTDOWN functions are triggered. On the other hand, fpm's
timeout doesn't invoke any of them.

In other words, I can only think of cases where "the cure would be worse
> than the disease".


To be honest, my impression is that you either underestimate the "disease",
or overestimate the current "cures". Speaking about the
latter, nginx + fpm, one of the most popular web server setups (if not the
most popular one) doesn't provide an easy-to-use and safe
way to shutdown execution after a dynamically configurable amount of time.
While one can use the "if (time() - $startTime > $timeout) { /* ... */ }"
based approach instead, this won't scale well when used in non-trivial
codeses. Thus, I believe my suggestion offers a more
convenient, safer, and more universal way to solve the underlying problem
of controlling the real execution time, than what the currently
available options do.

Regards:
Máté

Reply via email to