On Wednesday, 19 February 2014 at 01:09:43 UTC, Xinok wrote:
On Wednesday, 19 February 2014 at 00:16:03 UTC, Tolga Cakiroglu
wrote:
TL;DR the link though, how are they detecting that a CPU
fails? An information must be passes outside of CPU to do
this. The only solution comes to my mind is that main CPU
changes a variable on an external memory at every step, and
back up CPU checks it continuously to catch a failure
immediately. But this would require about 50% of CPU's power
already.
While thinking about this kind of back up systems, knowing and
reading that some people are really doing is really great.
I'm assuming this has something to do with it:
https://en.wikipedia.org/wiki/Heartbeat_%28computing%29
In clustered servers, the active node sends a continuous signal
indicating it's still alive. This signal is referred to as a
heartbeat. There's a standby node waiting to take over should
it stop receiving this signal.
I think only knowing that it has failed is not enough. Because
the process is landing, and other CPU should know where the
process is left. With that heatbeat signal, only option is that
all sensor information must be sent both CPUs continuously and
sensor values should be enough about what next step to be taken.
Then I think it can continue the process flawlessly.