[Linux-HA] Help in modifying stop signal for nfs server on rhel 5

Gianluca Cecchi Fri, 07 May 2010 03:43:24 -0700

Hello,
using rh el 5.5 32 bit with:

[r...@ha1 etc]# uname -r
2.6.18-194.el5


[r...@ha1 etc]# rpm -q pacemaker
pacemaker-1.0.8-6.el5

[r...@ha1 etc]# rpm -q nfs-utils
nfs-utils-1.0.9-44.el5

I have a problem because /etc/rc.d/init.d/nfs script exits 0 while nfsd
processes are still running.... baffed ;-)

testing move of resource from one node to the other one.

In messages
May  7 11:51:26 ha1 crmd: [21332]: info: te_rsc_command: Initiating action
99: stop nfssrv_stop_0 on ha1 (local)
May  7 11:51:26 ha1 crmd: [21332]: info: do_lrm_rsc_op: Performing
key=99:19:0:21f0ede4-27ee-4d7c-90be-928ee1c062e2 op=nfssrv_stop_0 )
May  7 11:51:26 ha1 lrmd: [21329]: info: rsc:nfssrv:47: stop
May  7 11:51:27 ha1 nfsserver[31830]: INFO: Stopping NFS server ...
May  7 11:51:27 ha1 mountd[27215]: Caught signal 15, un-registering and
exiting.
May  7 11:51:27 ha1 nfsserver[31830]: INFO: NFS server stopped
May  7 11:51:27 ha1 crmd: [21332]: info: process_lrm_event: LRM operation
nfssrv_stop_0 (call=47, rc=0, cib-update=138, confirmed=true) ok
May  7 11:51:27 ha1 crmd: [21332]: info: match_graph_event: Action
nfssrv_stop_0 (99) confirmed on ha1 (rc=0)

So that I have this with crm_mon:
     nfssrv     (ocf::heartbeat:nfsserver):     Stopped

but of course failures trying to unmount then the underlying fs
In fact
[r...@ha1 etc]# ps -ef|grep nfs
root      2591 16988  0 12:10 pts/0    00:00:00 grep nfs
root     27196     7  0 11:27 ?        00:00:00 [nfsd4]
root     27197     1  0 11:27 ?        00:00:00 [nfsd]
root     27198     1  0 11:27 ?        00:00:00 [nfsd]
root     27199     1  0 11:27 ?        00:00:00 [nfsd]
root     27200     1  0 11:27 ?        00:00:00 [nfsd]
root     27201     1  0 11:27 ?        00:00:00 [nfsd]
root     27202     1  0 11:27 ?        00:00:00 [nfsd]
root     27203     1  0 11:27 ?        00:00:00 [nfsd]
root     27204     1  0 11:27 ?        00:00:00 [nfsd]
root     27205     1  0 11:27 ?        00:00:00 [nfsd]
root     27206     1  0 11:27 ?        00:00:00 [nfsd]
root     27207     1  0 11:27 ?        00:00:00 [nfsd]
root     27208     1  0 11:27 ?        00:00:00 [nfsd]
root     27209     1  0 11:27 ?        00:00:00 [nfsd]
root     27210     1  0 11:27 ?        00:00:00 [nfsd]
root     27211     1  0 11:27 ?        00:00:00 [nfsd]
root     27212     1  0 11:27 ?        00:00:00 [nfsd]

If I try manually the script:
[r...@ha1 etc]# /etc/rc.d/init.d/nfs stop
Shutting down NFS mountd:                                  [FAILED]
Shutting down NFS daemon:                                  [  OK  ]
Shutting down NFS quotas:                                  [FAILED]
Shutting down NFS services:                                [FAILED]

So the problem is related only to nfsd daemons running yet...
[r...@ha1 etc]# ps -ef|grep rpc
rpc       1509     1  0 Apr30 ?        00:00:00 portmap
root      2985 16988  0 12:13 pts/0    00:00:00 grep rpc
root     27164     7  0 11:27 ?        00:00:00 [rpciod/0]
root     27244     1  0 11:27 ?        00:00:00 rpc.idmapd

Try to start "sh -x /etc/rc.d/init.d/nfs stop"

+ '[' -f /var/run/nfsd.pid ']'
+ return 3
+ '[' -z '' -a -z '' ']'
++ __pids_pidof nfsd
++ pidof -c -o 3417 -o 3392 -o %PPID -x nfsd
+ pid='27212 27211 27210 27209 27208 27207 27206 27205 27204 27203 27202
27201 27200 27199 27198 27197'
+ '[' -n '27212 27211 27210 27209 27208 27207 27206 27205 27204 27203 27202
27201 27200 27199 27198 27197' ']'
+ '[' color = verbose -a -z '' ']'
+ '[' -z -2 ']'
+ checkpid 27212 27211 27210 27209 27208 27207 27206 27205 27204 27203 27202
27201 27200 27199 27198 27197
+ local i
+ for i in '$*'
+ '[' -d /proc/27212 ']'
+ return 0
+ kill -2 27212 27211 27210 27209 27208 27207 27206 27205 27204 27203 27202
27201 27200 27199 27198 27197
+ RC=0
+ '[' 0 -eq 0 ']'
+ success 'nfsd -2'

So the problem seems to be the
kill -2
returning 0 without having killed anyone.... sort of Blank shots gun ;-)
I'm not expert with signals, but
/usr/src/kernels/2.6.18-194.el5-i686/include/asm/signal.h says
#define SIGINT           2

I found a three years old thread about this:
http://www.mail-archive.com/[email protected]/msg04706.html

I used killnfsd in old heartbeat v1 clusters... what should I use now with
pacemaker?

Verified manually with:
[r...@ha1 etc]# kill -2 27212 27211 27210 27209 27208 27207 27206 27205
27204 27203 27202 27201 27200 27199 27198 27197
[r...@ha1 etc]# echo $?
0

and the nfsd processes are still there....

What is the behaviour/signal of the other distros in stopping nfs server?
And what suggested for failover
If not already opened, I can open a bugzilla in rh el 5 too, if
requested....

BTW: what is the reason for nfsd4 process?
I want to bind to nfs v3 and in /etc/sysconfig/nfs  I put:
MOUNTD_NFS_V1="no"
MOUNTD_NFS_V2="no"
RPCNFSDARGS="-N 4"

It has been started by pid 7:
[r...@ha1 etc]# ps -fp 7
UID        PID  PPID  C STIME TTY          TIME CMD
root         7     1  0 Apr30 ?        00:00:00 [kthread]

Thanks in advance for your advises.

Gianluca
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Help in modifying stop signal for nfs server on rhel 5

Reply via email to