Hi,

On Fri, May 07, 2010 at 12:43:09PM +0200, Gianluca Cecchi wrote:
> Hello,
> using rh el 5.5 32 bit with:
> 
> [r...@ha1 etc]# uname -r
> 2.6.18-194.el5
> 
> [r...@ha1 etc]# rpm -q pacemaker
> pacemaker-1.0.8-6.el5
> 
> [r...@ha1 etc]# rpm -q nfs-utils
> nfs-utils-1.0.9-44.el5
> 
> I have a problem because /etc/rc.d/init.d/nfs script exits 0 while nfsd
> processes are still running.... baffed ;-)
> 
> testing move of resource from one node to the other one.
> 
> In messages
> May  7 11:51:26 ha1 crmd: [21332]: info: te_rsc_command: Initiating action
> 99: stop nfssrv_stop_0 on ha1 (local)
> May  7 11:51:26 ha1 crmd: [21332]: info: do_lrm_rsc_op: Performing
> key=99:19:0:21f0ede4-27ee-4d7c-90be-928ee1c062e2 op=nfssrv_stop_0 )
> May  7 11:51:26 ha1 lrmd: [21329]: info: rsc:nfssrv:47: stop
> May  7 11:51:27 ha1 nfsserver[31830]: INFO: Stopping NFS server ...
> May  7 11:51:27 ha1 mountd[27215]: Caught signal 15, un-registering and
> exiting.
> May  7 11:51:27 ha1 nfsserver[31830]: INFO: NFS server stopped
> May  7 11:51:27 ha1 crmd: [21332]: info: process_lrm_event: LRM operation
> nfssrv_stop_0 (call=47, rc=0, cib-update=138, confirmed=true) ok
> May  7 11:51:27 ha1 crmd: [21332]: info: match_graph_event: Action
> nfssrv_stop_0 (99) confirmed on ha1 (rc=0)
> 
> So that I have this with crm_mon:
>      nfssrv     (ocf::heartbeat:nfsserver):     Stopped
> 
> but of course failures trying to unmount then the underlying fs
> In fact
> [r...@ha1 etc]# ps -ef|grep nfs
> root      2591 16988  0 12:10 pts/0    00:00:00 grep nfs
> root     27196     7  0 11:27 ?        00:00:00 [nfsd4]
> root     27197     1  0 11:27 ?        00:00:00 [nfsd]
> root     27198     1  0 11:27 ?        00:00:00 [nfsd]
> root     27199     1  0 11:27 ?        00:00:00 [nfsd]
> root     27200     1  0 11:27 ?        00:00:00 [nfsd]
> root     27201     1  0 11:27 ?        00:00:00 [nfsd]
> root     27202     1  0 11:27 ?        00:00:00 [nfsd]
> root     27203     1  0 11:27 ?        00:00:00 [nfsd]
> root     27204     1  0 11:27 ?        00:00:00 [nfsd]
> root     27205     1  0 11:27 ?        00:00:00 [nfsd]
> root     27206     1  0 11:27 ?        00:00:00 [nfsd]
> root     27207     1  0 11:27 ?        00:00:00 [nfsd]
> root     27208     1  0 11:27 ?        00:00:00 [nfsd]
> root     27209     1  0 11:27 ?        00:00:00 [nfsd]
> root     27210     1  0 11:27 ?        00:00:00 [nfsd]
> root     27211     1  0 11:27 ?        00:00:00 [nfsd]
> root     27212     1  0 11:27 ?        00:00:00 [nfsd]
> 
> If I try manually the script:
> [r...@ha1 etc]# /etc/rc.d/init.d/nfs stop
> Shutting down NFS mountd:                                  [FAILED]
> Shutting down NFS daemon:                                  [  OK  ]
> Shutting down NFS quotas:                                  [FAILED]
> Shutting down NFS services:                                [FAILED]
> 
> So the problem is related only to nfsd daemons running yet...
> [r...@ha1 etc]# ps -ef|grep rpc
> rpc       1509     1  0 Apr30 ?        00:00:00 portmap
> root      2985 16988  0 12:13 pts/0    00:00:00 grep rpc
> root     27164     7  0 11:27 ?        00:00:00 [rpciod/0]
> root     27244     1  0 11:27 ?        00:00:00 rpc.idmapd
> 
> Try to start "sh -x /etc/rc.d/init.d/nfs stop"
> 
> + '[' -f /var/run/nfsd.pid ']'
> + return 3
> + '[' -z '' -a -z '' ']'
> ++ __pids_pidof nfsd
> ++ pidof -c -o 3417 -o 3392 -o %PPID -x nfsd
> + pid='27212 27211 27210 27209 27208 27207 27206 27205 27204 27203 27202
> 27201 27200 27199 27198 27197'
> + '[' -n '27212 27211 27210 27209 27208 27207 27206 27205 27204 27203 27202
> 27201 27200 27199 27198 27197' ']'
> + '[' color = verbose -a -z '' ']'
> + '[' -z -2 ']'
> + checkpid 27212 27211 27210 27209 27208 27207 27206 27205 27204 27203 27202
> 27201 27200 27199 27198 27197
> + local i
> + for i in '$*'
> + '[' -d /proc/27212 ']'
> + return 0
> + kill -2 27212 27211 27210 27209 27208 27207 27206 27205 27204 27203 27202
> 27201 27200 27199 27198 27197
> + RC=0
> + '[' 0 -eq 0 ']'
> + success 'nfsd -2'
> 
> So the problem seems to be the
> kill -2
> returning 0 without having killed anyone.... sort of Blank shots gun ;-)
> I'm not expert with signals, but
> /usr/src/kernels/2.6.18-194.el5-i686/include/asm/signal.h says
> #define SIGINT           2
> 
> I found a three years old thread about this:
> http://www.mail-archive.com/[email protected]/msg04706.html
> 
> I used killnfsd in old heartbeat v1 clusters... what should I use now with
> pacemaker?

Can you try to replace the INT signal with TERM (or KILL) in
/etc/init.d/nfs.

> Verified manually with:
> [r...@ha1 etc]# kill -2 27212 27211 27210 27209 27208 27207 27206 27205
> 27204 27203 27202 27201 27200 27199 27198 27197
> [r...@ha1 etc]# echo $?
> 0
> 
> and the nfsd processes are still there....

Try to wait for a while and see if they eventually do exit
(perhaps after some timeout)? In that case the stop action of
nfsserver should wait until /etc/init.d/nfs status reports
stopped.

Thanks,

Dejan

> What is the behaviour/signal of the other distros in stopping nfs server?
> And what suggested for failover
> If not already opened, I can open a bugzilla in rh el 5 too, if
> requested....
> 
> BTW: what is the reason for nfsd4 process?
> I want to bind to nfs v3 and in /etc/sysconfig/nfs  I put:
> MOUNTD_NFS_V1="no"
> MOUNTD_NFS_V2="no"
> RPCNFSDARGS="-N 4"
> 
> It has been started by pid 7:
> [r...@ha1 etc]# ps -fp 7
> UID        PID  PPID  C STIME TTY          TIME CMD
> root         7     1  0 Apr30 ?        00:00:00 [kthread]
> 
> Thanks in advance for your advises.
> 
> Gianluca
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to