For process-level graceful restarts see https://github.com/zimbatm/socketmaster and https://github.com/pusher/crank . Those could be integrated into the activation script.
On Mon, 28 Nov 2016 at 09:33 zimbatm <zimb...@zimbatm.com> wrote: > Hi Stewart, > > In a HA setup availability is generally achieved on a network level > instead of system level. Typically you would have two hotswappable > load-balancers that distribute the traffic to multiple instances of your > service boxes. In that context is doesn't matter how processes are being > restarted because the load-balancer will automatically detect unresponsive > machines and route the traffic accordingly. It's also handy because it > allows to restart the machines in the event where the kernel needs an > upgrade. In that setup I suppose you can think of each machine as being one > Erlang OTP "process" and the network the "message-passing". > > One responsibility of the service in that setup is to shutdown properly to > avoid unnecessary disruption of service. Mainly when the process gets the > SIGTERM signal it should close the listening socket (so the load-balancer > can route new incoming connections to a different machine) and then drain > the existing client connection gracefully. It shouldn't stop all at once > but let the clients disconnect when they are done with their sessions (and > optionally signal them to go away if the protocol supports it). > > A last thing regarding this approach: generally you need a way to control > the deploys; if all the service boxes are being upgraded at the same time > then the load-balancer doesn't have anywhere to route the traffic to. It's > also something desirable to have to do blue/green deployments. > > I need to stop there for now but I also have a similar design answer on > the system level where processes get replaced gracefully. > > Cheers, > z > > On Sun, 27 Nov 2016 at 04:33 stewart mackenzie <setor...@gmail.com> wrote: > > 9 9s not unheard of in these circles, Google uptimes are a joke not worthy > of mention. > > There are systems that have been running for some 40 odd years in > production that factor in changes to legal banking regulations, hardware, > business logic etc. Erlang has a system called the Ericsson AXD301 which > has achieved this time frame. > > Just because Nixos hasn't been around that long doesn't mean it can't have > the primitives to allow for such feats. Its these primitives I'm enquiring > about. > > So let's use a new, less controversial figure of 5 9s and keep on topic. > > The thing is, we're designing this system so that its governed by nix > don't necessarily have to depend heavily on the runtime - I really don't > want to go down the imperative route, by introducing imperative language > concepts into our declarative language which is managed by another > declarative language (nix). Besides just bringing in a single component > with an OS Dependency demands we manage this change from nix level. > > We currently have a hack in place, that will resolve dependencies and give > us a path to load a correctly compiled shared object into memory: > https://github.com/fractalide/fractalide/blob/master/components/nucleus/find/component/src/lib.rs#L43 > nasty and cringe worthy I know. > > Thanks for your pointer, I'll take a look at these activation scripts. > > Maybe this hack is the answer, and confine the dynamism to an ssh login al > a Erlang style... > _______________________________________________ > nix-dev mailing list > nix-dev@lists.science.uu.nl > http://lists.science.uu.nl/mailman/listinfo/nix-dev > >
_______________________________________________ nix-dev mailing list nix-dev@lists.science.uu.nl http://lists.science.uu.nl/mailman/listinfo/nix-dev