Hello,
using rh el 5.5 32 bit with:
[r...@ha1 etc]# uname -r
2.6.18-194.el5
[r...@ha1 etc]# rpm -q pacemaker
pacemaker-1.0.8-6.el5
[r...@ha1 etc]# rpm -q nfs-utils
nfs-utils-1.0.9-44.el5
I have a problem because /etc/rc.d/init.d/nfs script exits 0 while nfsd
processes are still running.... baffed ;-)
testing move of resource from one node to the other one.
In messages
May 7 11:51:26 ha1 crmd: [21332]: info: te_rsc_command: Initiating action
99: stop nfssrv_stop_0 on ha1 (local)
May 7 11:51:26 ha1 crmd: [21332]: info: do_lrm_rsc_op: Performing
key=99:19:0:21f0ede4-27ee-4d7c-90be-928ee1c062e2 op=nfssrv_stop_0 )
May 7 11:51:26 ha1 lrmd: [21329]: info: rsc:nfssrv:47: stop
May 7 11:51:27 ha1 nfsserver[31830]: INFO: Stopping NFS server ...
May 7 11:51:27 ha1 mountd[27215]: Caught signal 15, un-registering and
exiting.
May 7 11:51:27 ha1 nfsserver[31830]: INFO: NFS server stopped
May 7 11:51:27 ha1 crmd: [21332]: info: process_lrm_event: LRM operation
nfssrv_stop_0 (call=47, rc=0, cib-update=138, confirmed=true) ok
May 7 11:51:27 ha1 crmd: [21332]: info: match_graph_event: Action
nfssrv_stop_0 (99) confirmed on ha1 (rc=0)
So that I have this with crm_mon:
nfssrv (ocf::heartbeat:nfsserver): Stopped
but of course failures trying to unmount then the underlying fs
In fact
[r...@ha1 etc]# ps -ef|grep nfs
root 2591 16988 0 12:10 pts/0 00:00:00 grep nfs
root 27196 7 0 11:27 ? 00:00:00 [nfsd4]
root 27197 1 0 11:27 ? 00:00:00 [nfsd]
root 27198 1 0 11:27 ? 00:00:00 [nfsd]
root 27199 1 0 11:27 ? 00:00:00 [nfsd]
root 27200 1 0 11:27 ? 00:00:00 [nfsd]
root 27201 1 0 11:27 ? 00:00:00 [nfsd]
root 27202 1 0 11:27 ? 00:00:00 [nfsd]
root 27203 1 0 11:27 ? 00:00:00 [nfsd]
root 27204 1 0 11:27 ? 00:00:00 [nfsd]
root 27205 1 0 11:27 ? 00:00:00 [nfsd]
root 27206 1 0 11:27 ? 00:00:00 [nfsd]
root 27207 1 0 11:27 ? 00:00:00 [nfsd]
root 27208 1 0 11:27 ? 00:00:00 [nfsd]
root 27209 1 0 11:27 ? 00:00:00 [nfsd]
root 27210 1 0 11:27 ? 00:00:00 [nfsd]
root 27211 1 0 11:27 ? 00:00:00 [nfsd]
root 27212 1 0 11:27 ? 00:00:00 [nfsd]
If I try manually the script:
[r...@ha1 etc]# /etc/rc.d/init.d/nfs stop
Shutting down NFS mountd: [FAILED]
Shutting down NFS daemon: [ OK ]
Shutting down NFS quotas: [FAILED]
Shutting down NFS services: [FAILED]
So the problem is related only to nfsd daemons running yet...
[r...@ha1 etc]# ps -ef|grep rpc
rpc 1509 1 0 Apr30 ? 00:00:00 portmap
root 2985 16988 0 12:13 pts/0 00:00:00 grep rpc
root 27164 7 0 11:27 ? 00:00:00 [rpciod/0]
root 27244 1 0 11:27 ? 00:00:00 rpc.idmapd
Try to start "sh -x /etc/rc.d/init.d/nfs stop"
+ '[' -f /var/run/nfsd.pid ']'
+ return 3
+ '[' -z '' -a -z '' ']'
++ __pids_pidof nfsd
++ pidof -c -o 3417 -o 3392 -o %PPID -x nfsd
+ pid='27212 27211 27210 27209 27208 27207 27206 27205 27204 27203 27202
27201 27200 27199 27198 27197'
+ '[' -n '27212 27211 27210 27209 27208 27207 27206 27205 27204 27203 27202
27201 27200 27199 27198 27197' ']'
+ '[' color = verbose -a -z '' ']'
+ '[' -z -2 ']'
+ checkpid 27212 27211 27210 27209 27208 27207 27206 27205 27204 27203 27202
27201 27200 27199 27198 27197
+ local i
+ for i in '$*'
+ '[' -d /proc/27212 ']'
+ return 0
+ kill -2 27212 27211 27210 27209 27208 27207 27206 27205 27204 27203 27202
27201 27200 27199 27198 27197
+ RC=0
+ '[' 0 -eq 0 ']'
+ success 'nfsd -2'
So the problem seems to be the
kill -2
returning 0 without having killed anyone.... sort of Blank shots gun ;-)
I'm not expert with signals, but
/usr/src/kernels/2.6.18-194.el5-i686/include/asm/signal.h says
#define SIGINT 2
I found a three years old thread about this:
http://www.mail-archive.com/[email protected]/msg04706.html
I used killnfsd in old heartbeat v1 clusters... what should I use now with
pacemaker?
Verified manually with:
[r...@ha1 etc]# kill -2 27212 27211 27210 27209 27208 27207 27206 27205
27204 27203 27202 27201 27200 27199 27198 27197
[r...@ha1 etc]# echo $?
0
and the nfsd processes are still there....
What is the behaviour/signal of the other distros in stopping nfs server?
And what suggested for failover
If not already opened, I can open a bugzilla in rh el 5 too, if
requested....
BTW: what is the reason for nfsd4 process?
I want to bind to nfs v3 and in /etc/sysconfig/nfs I put:
MOUNTD_NFS_V1="no"
MOUNTD_NFS_V2="no"
RPCNFSDARGS="-N 4"
It has been started by pid 7:
[r...@ha1 etc]# ps -fp 7
UID PID PPID C STIME TTY TIME CMD
root 7 1 0 Apr30 ? 00:00:00 [kthread]
Thanks in advance for your advises.
Gianluca
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems