On 6/4/22 2:46 PM, Ola Fosheim Grøstad wrote:
On Saturday, 4 June 2022 at 18:32:48 UTC, Sebastiaan Koppe wrote:
Most wont throw a Error though. And typical services have canary
releases and rollback.
So you just fix it, which you have to do anyway.
I take it you mean manual rollback, but the key issue is that you want
to retry on failure. Not infrequently the source for the failure will be
in the environment, the code just didn't handle the failure correctly.
You shouldn't retry on Error, and you shouldn't actually have any Errors
thrown.
I'll draw a line in the sand here -- OutOfMemoryError shouldn't be an
Error, but an Exception. Because there's no way you can check if an
allocation will succeed before doing it, and arguably, there are ways to
deal with out of memory problems without shutting down the process.
On a service with SLA of 99.999% the probable "failure time" would be 6
seconds per week, so if you can retry you may still run fine even if you
failed to check correctly for an error on that specific subsystem. That
makes the system more resilient/robust.
Exceptions are perfectly fine to catch and retry. Anticipating the
failing condition, and throwing an exception instead is a viable solution.
-Steve