On Thu, Aug 8, 2013 at 7:31 PM, Iustin Pop <[email protected]> wrote:
> On Thu, Aug 08, 2013 at 10:45:14AM +0200, Guido Trotter wrote:
>> On Thu, Aug 8, 2013 at 10:33 AM, Michele Tartara <[email protected]> wrote:
>> > On Thu, Aug 8, 2013 at 9:01 AM, Guido Trotter <[email protected]> wrote:
>> >>
>> >> On Wed, Aug 7, 2013 at 9:34 PM, Iustin Pop <[email protected]> wrote:
>> >> > On Wed, Aug 07, 2013 at 04:03:38PM +0200, Michele Tartara wrote:
>> >> >> On Wed, Aug 7, 2013 at 1:56 PM, Guido Trotter <[email protected]>
>> >> >> wrote:
>> >> >>
>> >> >> > On Wed, Aug 7, 2013 at 9:36 AM, Thomas Thrainer <[email protected]>
>> >> >> > wrote:
>> >> >> > > On Tue, Aug 6, 2013 at 5:56 PM, Michele Tartara
>> >> >> > > <[email protected]>
>> >> >> > > wrote:
>> >> >> > >> +``Configuration management daemon (ConfDW)``
>> >> >> > >> +  It will run on the master node and it will be responsible for
>> >> >> > >> the
>> >> >> > >> management
>> >> >> > >> +  of the authoritative copy of the cluster configuration (that
>> >> >> > >> is, it
>> >> >> > >> will be
>> >> >> > >> +  the daemon actually modifying the ``config.data`` file). All
>> >> >> > >> the
>> >> >> > >> requests of
>> >> >> > >> +  configuration changes will have to pass through this daemon.
>> >> >> > >> Having a
>> >> >> > >> single
>> >> >> > >> +  point of configuration management will also allow Ganeti to get
>> >> >> > >> rid
>> >> >> > of
>> >> >> > >> +  possible race conditions due to concurrent modifications of the
>> >> >> > >> configuration.
>> >> >> > >> +  When the configuration is updated, it will have to push the
>> >> >> > >> received
>> >> >> > >> changes
>> >> >> > >> +  to the ConfDR daemons, to keep them up to date.
>> >> >> > >> +  This daemon will also be the one responsible for managing the
>> >> >> > >> locks,
>> >> >> > >> granting
>> >> >> > >> +  them to the jobs requesting them, and taking care of freeing
>> >> >> > >> them up
>> >> >> > if
>> >> >> > >> the
>> >> >> > >> +  jobs holding them crash or are terminated before releasing
>> >> >> > >> them.
>> >> >> > >
>> >> >> > >
>> >> >> > > How?
>> >> >> > >
>> >> >> >
>> >> >> > To be detailed. (in this or a separate design, to keep just the split
>> >> >> > simpler).
>> >> >> > (I believe it should be detailed, but as long as we don't think it's
>> >> >> > impossible we can defer the detailing and point from here to a second
>> >> >> > design: of course we should have that design too, before
>> >> >> > implementing).
>> >> >> >
>> >> >>
>> >> >> I guess checking for the existence of a process with the PID of the
>> >> >> lock
>> >> >> older should be enough.
>> >> >> I know PIDs are not ensured to be uniques, but I think they are unique
>> >> >> enough for this not to be a problem.
>> >> >> And if we really think this is going to be a problem, we can also check
>> >> >> the
>> >> >> actual program command line via /proc.
>> >> >
>> >> > This is still not the best way (I think).
>> >> >
>> >> > The way this is usually done in Unix is that the forking process "knows"
>> >> > its children and receives termination signals (SIGCHLD) when they exit;
>> >> > that way, it knows precisely which children are still running and which
>> >> > have died.
>> >> >
>> >> > So if you keep a simple mapping between child PID and job ID, it should
>> >> > be fine.
>> >> >
>> >> > Note that I don't know how well Haskell deals with SIGCHLD and whether
>> >> > it's still easily usable or if it's completely hidden by some RunProcess
>> >> > abstraction…
>> >> >
>> >>
>> >> Note that if jobs run in separate processes we need to make sure the
>> >> way we handle them survives a restart of the job daemon, since they
>> >> can be pretty long-run themselves.
>> >> As such the SIGCHLD option won't work without further changes. But I
>> >> also don't believe in tracking pids and "checking": I think a system
>> >> of communication (via filesystem sockets, probably) should be in place
>> >> for this to be resilient.
>> >> What do you think?
>> >>
>> >
>> > I agree with SIGCHLD not being really usable because we want persistence
>> > across LuxiD reboots.
>> > But what are you suggesting exactly with "system of communication"?
>> > Something like a unique (job-id based maybe?) unix socket for the
>> > communication between each LuxiD and each specific job? So that as long as
>> > that socket exists we know that either the job replies to pings on that or
>> > it is dead?
>>
>> That's an option, even without pings: if the socket gets closed the
>> process is dead.
>> Pings would be needed just in case of loops, livelocks and worse
>> situations, but we don't handle them now.
>
> +1 to the jobs each having a socket; I don't think the method that Jose
> proposes (jobs pinging LuxiD back) is better, I think a simple mapping
> inside LuxiD jobid→unix socket is very nice.
>
>> The point is how to handle the reconnection when luxid restarts and
>> what do the jobs do in the meantime when they detect a dead luxid.
>
> … fun questions :) I guess you'll want to move to a system where jobs
> write the result to a file on disk (agreed before job starts), and jobs
> just exit when finishing, even if luxid is dead?
>
> Anyway, I was interested in the overall design, so thanks again for it
> and I'll stay out of the discussion :)
>

We welcome feedback on the details as well. I think we can at this
point commit this design (when Michele sends the promised changes) and
then write new ones for the detailed parts of IPC.

Thanks!

Guido

Reply via email to