Hi Stewart, In a HA setup availability is generally achieved on a network level instead of system level. Typically you would have two hotswappable load-balancers that distribute the traffic to multiple instances of your service boxes. In that context is doesn't matter how processes are being restarted because the load-balancer will automatically detect unresponsive machines and route the traffic accordingly. It's also handy because it allows to restart the machines in the event where the kernel needs an upgrade. In that setup I suppose you can think of each machine as being one Erlang OTP "process" and the network the "message-passing".
One responsibility of the service in that setup is to shutdown properly to avoid unnecessary disruption of service. Mainly when the process gets the SIGTERM signal it should close the listening socket (so the load-balancer can route new incoming connections to a different machine) and then drain the existing client connection gracefully. It shouldn't stop all at once but let the clients disconnect when they are done with their sessions (and optionally signal them to go away if the protocol supports it). A last thing regarding this approach: generally you need a way to control the deploys; if all the service boxes are being upgraded at the same time then the load-balancer doesn't have anywhere to route the traffic to. It's also something desirable to have to do blue/green deployments. I need to stop there for now but I also have a similar design answer on the system level where processes get replaced gracefully. Cheers, z On Sun, 27 Nov 2016 at 04:33 stewart mackenzie <[email protected]> wrote: > 9 9s not unheard of in these circles, Google uptimes are a joke not worthy > of mention. > > There are systems that have been running for some 40 odd years in > production that factor in changes to legal banking regulations, hardware, > business logic etc. Erlang has a system called the Ericsson AXD301 which > has achieved this time frame. > > Just because Nixos hasn't been around that long doesn't mean it can't have > the primitives to allow for such feats. Its these primitives I'm enquiring > about. > > So let's use a new, less controversial figure of 5 9s and keep on topic. > > The thing is, we're designing this system so that its governed by nix > don't necessarily have to depend heavily on the runtime - I really don't > want to go down the imperative route, by introducing imperative language > concepts into our declarative language which is managed by another > declarative language (nix). Besides just bringing in a single component > with an OS Dependency demands we manage this change from nix level. > > We currently have a hack in place, that will resolve dependencies and give > us a path to load a correctly compiled shared object into memory: > https://github.com/fractalide/fractalide/blob/master/components/nucleus/find/component/src/lib.rs#L43 > nasty and cringe worthy I know. > > Thanks for your pointer, I'll take a look at these activation scripts. > > Maybe this hack is the answer, and confine the dynamism to an ssh login al > a Erlang style... > _______________________________________________ > nix-dev mailing list > [email protected] > http://lists.science.uu.nl/mailman/listinfo/nix-dev >
_______________________________________________ nix-dev mailing list [email protected] http://lists.science.uu.nl/mailman/listinfo/nix-dev
