On Wednesday, 11 July 2018 at 12:45:40 UTC, crimaniak wrote:
The error should be maximally localized, and the programmer should be able to respond to any type of errors. The very nature of the work of WEB applications contributes to this. As a rule, queries are handled by short-lived tasks that work with thread-local memory, and killing only the task that caused the error, with the transfer of the exception to the calling task, would radically improve the situation.

Hmm. The fun fun fun thing about undefined behaviour in the absence of MMU's is the effects are maximally _unlocalized_.

ie. It can corrupt _any_ part of the system.

A use after free for example, or an index out of bounds on the heap, can corrupt all and any subsystem sharing the same virtual address space.

Part of the reason why Walter is pushing so hard for memory safety.

Memory Safety is truly a huge step away from the world of pain that is C/C++.... it removes a truly huge class of defects.

However, it also removes a common terminology. Odds on you know what I mean when I say "use after free" or "index out of bounds".

Now in the levels above the language and the library, humans are equally capable of screwing up and corrupting our own work.... except the language can no longer help you.

Above the language and the library, we no longer have a common terminology for describing the myriad ways you can shoot yourself in the foot.

The language can, through encapsulation "minimize the blast radius", but can't stop you.

I disagree with Bjarne Stroustrup on many things.... but in this article he is absolutely spot on. https://www.artima.com/intv/goldilocks3.html

Please read it, it's probably the most important article on Object Oriented Design you'll find.

Now the problem with "unexpected" exceptions is, odds on you are left with a broken invariant.

ie. Odds on you are left with an object you now cannot reasonably expect to function.

ie. Odds on that object you cannot expect to function, is part of a larger object or subsystem you now cannot reasonably expect to function.

ie. You left with a system that will progressively become flakier and flakier and less responsive and less reliable.

The only sane response really is to reset to a defined state as quickly as possible. ie. Capture a backtrace, exit process and restart.

Your effort in trying to catch and handle unexpected events to achieve uptime is misplaced, you are much better served by Chaos Monkeys.

ie. Deliberately randomly "hard kill" your running systems at random moments and spend your efforts on designing for no resulting corruption and rapid and reliable reset.

I certainly wouldn't unleash Chaos Monkeys on a production system until I was really comfortable with the behaviour of on a test system....

Reply via email to