On Thu, Aug 8, 2013 at 7:31 PM, Iustin Pop <[email protected]> wrote: > On Thu, Aug 08, 2013 at 10:45:14AM +0200, Guido Trotter wrote: >> On Thu, Aug 8, 2013 at 10:33 AM, Michele Tartara <[email protected]> wrote: >> > On Thu, Aug 8, 2013 at 9:01 AM, Guido Trotter <[email protected]> wrote: >> >> >> >> On Wed, Aug 7, 2013 at 9:34 PM, Iustin Pop <[email protected]> wrote: >> >> > On Wed, Aug 07, 2013 at 04:03:38PM +0200, Michele Tartara wrote: >> >> >> On Wed, Aug 7, 2013 at 1:56 PM, Guido Trotter <[email protected]> >> >> >> wrote: >> >> >> >> >> >> > On Wed, Aug 7, 2013 at 9:36 AM, Thomas Thrainer <[email protected]> >> >> >> > wrote: >> >> >> > > On Tue, Aug 6, 2013 at 5:56 PM, Michele Tartara >> >> >> > > <[email protected]> >> >> >> > > wrote: >> >> >> > >> +``Configuration management daemon (ConfDW)`` >> >> >> > >> + It will run on the master node and it will be responsible for >> >> >> > >> the >> >> >> > >> management >> >> >> > >> + of the authoritative copy of the cluster configuration (that >> >> >> > >> is, it >> >> >> > >> will be >> >> >> > >> + the daemon actually modifying the ``config.data`` file). All >> >> >> > >> the >> >> >> > >> requests of >> >> >> > >> + configuration changes will have to pass through this daemon. >> >> >> > >> Having a >> >> >> > >> single >> >> >> > >> + point of configuration management will also allow Ganeti to get >> >> >> > >> rid >> >> >> > of >> >> >> > >> + possible race conditions due to concurrent modifications of the >> >> >> > >> configuration. >> >> >> > >> + When the configuration is updated, it will have to push the >> >> >> > >> received >> >> >> > >> changes >> >> >> > >> + to the ConfDR daemons, to keep them up to date. >> >> >> > >> + This daemon will also be the one responsible for managing the >> >> >> > >> locks, >> >> >> > >> granting >> >> >> > >> + them to the jobs requesting them, and taking care of freeing >> >> >> > >> them up >> >> >> > if >> >> >> > >> the >> >> >> > >> + jobs holding them crash or are terminated before releasing >> >> >> > >> them. >> >> >> > > >> >> >> > > >> >> >> > > How? >> >> >> > > >> >> >> > >> >> >> > To be detailed. (in this or a separate design, to keep just the split >> >> >> > simpler). >> >> >> > (I believe it should be detailed, but as long as we don't think it's >> >> >> > impossible we can defer the detailing and point from here to a second >> >> >> > design: of course we should have that design too, before >> >> >> > implementing). >> >> >> > >> >> >> >> >> >> I guess checking for the existence of a process with the PID of the >> >> >> lock >> >> >> older should be enough. >> >> >> I know PIDs are not ensured to be uniques, but I think they are unique >> >> >> enough for this not to be a problem. >> >> >> And if we really think this is going to be a problem, we can also check >> >> >> the >> >> >> actual program command line via /proc. >> >> > >> >> > This is still not the best way (I think). >> >> > >> >> > The way this is usually done in Unix is that the forking process "knows" >> >> > its children and receives termination signals (SIGCHLD) when they exit; >> >> > that way, it knows precisely which children are still running and which >> >> > have died. >> >> > >> >> > So if you keep a simple mapping between child PID and job ID, it should >> >> > be fine. >> >> > >> >> > Note that I don't know how well Haskell deals with SIGCHLD and whether >> >> > it's still easily usable or if it's completely hidden by some RunProcess >> >> > abstraction… >> >> > >> >> >> >> Note that if jobs run in separate processes we need to make sure the >> >> way we handle them survives a restart of the job daemon, since they >> >> can be pretty long-run themselves. >> >> As such the SIGCHLD option won't work without further changes. But I >> >> also don't believe in tracking pids and "checking": I think a system >> >> of communication (via filesystem sockets, probably) should be in place >> >> for this to be resilient. >> >> What do you think? >> >> >> > >> > I agree with SIGCHLD not being really usable because we want persistence >> > across LuxiD reboots. >> > But what are you suggesting exactly with "system of communication"? >> > Something like a unique (job-id based maybe?) unix socket for the >> > communication between each LuxiD and each specific job? So that as long as >> > that socket exists we know that either the job replies to pings on that or >> > it is dead? >> >> That's an option, even without pings: if the socket gets closed the >> process is dead. >> Pings would be needed just in case of loops, livelocks and worse >> situations, but we don't handle them now. > > +1 to the jobs each having a socket; I don't think the method that Jose > proposes (jobs pinging LuxiD back) is better, I think a simple mapping > inside LuxiD jobid→unix socket is very nice. > >> The point is how to handle the reconnection when luxid restarts and >> what do the jobs do in the meantime when they detect a dead luxid. > > … fun questions :) I guess you'll want to move to a system where jobs > write the result to a file on disk (agreed before job starts), and jobs > just exit when finishing, even if luxid is dead? > > Anyway, I was interested in the overall design, so thanks again for it > and I'll stay out of the discussion :) >
We welcome feedback on the details as well. I think we can at this point commit this design (when Michele sends the promised changes) and then write new ones for the detailed parts of IPC. Thanks! Guido
