On Thu, Aug 8, 2013 at 12:37 PM, Jose A. Lopes <[email protected]> wrote:
> > That would not work if we want LuxiD to be able to be restarted while > jobs > > are running (might be useful for easier upgrades). We would like to > persist > > information about jobs and the job queue to disk, and then obviously the > > parent/child relationship is gone. But maybe we could implement the > proper > > way for normal operation and check only on startup using the PID/creation > > time/cmdline in /proc approach. > > Why not reverse the direction of parent/child pings? > > For example, why not have a UNIX socket in the master and the job > processes must ping on the master socket every now and then. This way > we just have one socket instead of having one per process. If the > master dies, the job processes know because the UNIX socket gets > closed. But they just keep trying until the socket comes back. > > Moreover, perhaps we don't have to persist any job queue information, > because when the master comes back up, it will simply collect the > pings from the job processes that are still running. > > If you like this idea, we can even remove the UNIX socket from the > picture and simply add a LUXI ping request, used only by the job > processes, to communicate with the master. > > What do you think ? > > Jose > I don't think this would work. What happens if a job terminates while LuxiD is not available? LuxiD, after coming back, will have no idea about which jobs are still running and which are not, because it will end up just waiting for their ping. And what if a job finishes before LuxiD is back? It will never send a ping again, or (even more important) a message saying that everything went well and it's releasing the locks. Having multiple sockets it's easier. LuxiD only needs to have a persistent list of where all the sockets are located, and just needs to try to connect to them to find out whether the jobs are still alive and working. Thanks, Michele -- Google Germany GmbH Dienerstr. 12 80331 München Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Graham Law, Christine Elizabeth Flores
