On Wed, Aug 07, 2013 at 04:03:38PM +0200, Michele Tartara wrote: > On Wed, Aug 7, 2013 at 1:56 PM, Guido Trotter <[email protected]> wrote: > > > On Wed, Aug 7, 2013 at 9:36 AM, Thomas Thrainer <[email protected]> > > wrote: > > > On Tue, Aug 6, 2013 at 5:56 PM, Michele Tartara <[email protected]> > > > wrote: > > >> +``Configuration management daemon (ConfDW)`` > > >> + It will run on the master node and it will be responsible for the > > >> management > > >> + of the authoritative copy of the cluster configuration (that is, it > > >> will be > > >> + the daemon actually modifying the ``config.data`` file). All the > > >> requests of > > >> + configuration changes will have to pass through this daemon. Having a > > >> single > > >> + point of configuration management will also allow Ganeti to get rid > > of > > >> + possible race conditions due to concurrent modifications of the > > >> configuration. > > >> + When the configuration is updated, it will have to push the received > > >> changes > > >> + to the ConfDR daemons, to keep them up to date. > > >> + This daemon will also be the one responsible for managing the locks, > > >> granting > > >> + them to the jobs requesting them, and taking care of freeing them up > > if > > >> the > > >> + jobs holding them crash or are terminated before releasing them. > > > > > > > > > How? > > > > > > > To be detailed. (in this or a separate design, to keep just the split > > simpler). > > (I believe it should be detailed, but as long as we don't think it's > > impossible we can defer the detailing and point from here to a second > > design: of course we should have that design too, before > > implementing). > > > > I guess checking for the existence of a process with the PID of the lock > older should be enough. > I know PIDs are not ensured to be uniques, but I think they are unique > enough for this not to be a problem. > And if we really think this is going to be a problem, we can also check the > actual program command line via /proc.
This is still not the best way (I think). The way this is usually done in Unix is that the forking process "knows" its children and receives termination signals (SIGCHLD) when they exit; that way, it knows precisely which children are still running and which have died. So if you keep a simple mapping between child PID and job ID, it should be fine. Note that I don't know how well Haskell deals with SIGCHLD and whether it's still easily usable or if it's completely hidden by some RunProcess abstraction… > > >> +leaving the codebase in a consistent and usable state. > > >> + > > >> +#. Rename QueryD to LuxiD. > > > > > > > > > Already done. QueryD existed only for a day or so and is probably not > > worth > > > mentioning. > > > > If I recall correctly, the review of the patch introducing the renaming was > LGTMed (I think by Iustin) after the promise of a design doc explaining the > reason for that. This is such a design doc, so I think it should stay here. I hope my LGTM didn't create problems :/ And thanks for this design, indeed it's what I was was curious for :) iustin
