Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing

Dejan Muhamedagic Fri, 24 Oct 2014 05:29:32 -0700

On Thu, Oct 23, 2014 at 09:14:32PM +0200, Lars Ellenberg wrote:
> On Wed, Oct 22, 2014 at 03:09:12PM +0200, Dejan Muhamedagic wrote:
> > On Wed, Oct 22, 2014 at 06:50:37AM -0600, Alan Robertson wrote:
> > > On 10/22/2014 03:33 AM, Dejan Muhamedagic wrote:
> > > > Hi Alan,
> > > >
> > > > On Mon, Oct 20, 2014 at 02:52:13PM -0600, Alan Robertson wrote:
> > > >> For the Assimilation code I use the full pathname of the binary from
> > > >> /proc to tell if it's "one of mine".  That's not perfect if you're 
> > > >> using
> > > >> an interpreted language.  It works quite well for compiled languages.
> > > > Yes, though not perfect, that may be good enough. I supposed that
> > > > the probability that the very same program gets the same recycled
> > > > pid is rather low. (Or is it?)
> > > From my 'C' code I could touch the lock file to match the timestamp of
> > > the /proc/pid/stat (or /proc/pid/exe) symlink -- and verify that they
> > > match.  If there is no /proc/pid/stat, then you won't get that extra
> > > safeguard.  But as you suggest, it decreases the probability by orders
> > > of magnitude even without the
> > > 
> > > The /proc/pid/exe symlink appears to have the same timestamp as
> > > /proc/pid/stat
> > 
> > Hmm, not here:
> > 
> > $ sudo ls -lt /proc/1
> > ...
> > lrwxrwxrwx 1 root root 0 Aug 27 13:51 exe -> /sbin/init
> > dr-x------ 2 root root 0 Aug 27 13:51 fd
> > -r--r--r-- 1 root root 0 Aug 27 13:20 cmdline
> > -r--r--r-- 1 root root 0 Aug 27 13:18 stat
> 
> 
> We can not rely on properties of the inodes in /proc/.
> 
> These inodes get dropped and recreated as the system sees fit.
> and their properties re-initialized to "something".
> Ok, the uid/gid is consistent, obviously.
> But neither inode numbers or a,m,ctime is "stable".
> 
> I demo'ed that in my first email,
> I demo it again here:
> 
> sleep 120 & k=$! ; stat /proc/$k ; echo 3 > /proc/sys/vm/drop_caches ; sleep 
> 2; find /proc/ -ls &> /dev/null;  stat /proc/$k
> 
>   File: `/proc/8862'
>   Size: 0               Blocks: 0          IO Block: 1024   directory
> Device: 3h/3d   Inode: 4295899     Links: 8
> Access: (0555/dr-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2014-10-23 18:43:25.535000006 +0200
> Modify: 2014-10-23 18:43:25.535000006 +0200
> Change: 2014-10-23 18:43:25.535000006 +0200
> 
>   File: `/proc/8862'
>   Size: 0               Blocks: 0          IO Block: 1024   directory
> Device: 3h/3d   Inode: 4296016     Links: 8
> Access: (0555/dr-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2014-10-23 18:43:27.561002753 +0200
> Modify: 2014-10-23 18:43:27.561002753 +0200
> Change: 2014-10-23 18:43:27.561002753 +0200
> 
> 
> Note how the inode number and a,m,ctime changes.
> 
> the "starttime" I was talking about is the 22nd field of /proc/$pid/stat
> see proc(5):
>   starttime %llu (was %lu before Linux 2.6)
>       (22)  The  time the process started after system boot.
>       In kernels before Linux 2.6, this value was expressed in jiffies.
>       Since Linux 2.6, the value is expressed in clock ticks
>       (divide by sysconf(_SC_CLK_TCK)).
> 
> Thats a monotonic time counting from system boot.
> Which makes it so attractive.
> Even if someone fiddles with date --set (or ntp or ...),
> even if that would be done on purpose, this field would not care.
> 
> Anyways: making this "somebody elses problem",
> using (a tool like) start-stop-daemon,
> require that to be present,
> and help make that do the best thing possible,
> and as portable as possible, could be a good way to go.
> 
> Still the "cd /proc/$pid", then work from there
> would avoid various race conditions nicely.
> Where available, open() then openat() will do nicely, as well,
> no need to chdir.
> 
> So the "quick fix" to solve the issue that triggered the discussion
> (not noticing that a pid has died):
> is my first suggestion:
> # wait for pid to die:
> -     while kill -0 $pid; do sleep 1; done
> +     ( if cd /proc/$pid ; then while test -d . ; do sleep 1; done ; fi ) &> 
> /dev/null
> 
> 
> Should we do an ocf-shellfuncs helper for this?


Yes.

> Suggested names?

There's already a function which is a naive implementation:

ocf_stop_processes

We could modify that one.

> What should go in there -- only the waiting, the kill TERM?
> A timeout?  Escalation to kill KILL?

There's already some interface, I suppose we can keep it.
It accepts the list of processes and does

kill ... $pids

Not sure how to handle that. Run all of them in background in a
loop and then wait(1) for them?

Cheers,

Dejan

>       Lars
> 
> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing

Reply via email to