> So how do they do that? If there's power failure on a specific box... There is transactional integrity where you're good until the failure, then you halt and fix/failover/etc. It's relatively cheap and popular.
> I can imagine mitigating this by redundantly processing everything Then there are things called non-stop-computing where the whole system is transactioned. Some of that happens in systems like these. How close these things get to being bulletproof I've not looked into. https://en.wikipedia.org/wiki/IBM_System_z Also Sun, HP, Fujitsu and the like. Look into what NASDAQ runs...
