Hi, On Fri, May 07, 2010 at 12:43:09PM +0200, Gianluca Cecchi wrote: > Hello, > using rh el 5.5 32 bit with: > > [r...@ha1 etc]# uname -r > 2.6.18-194.el5 > > [r...@ha1 etc]# rpm -q pacemaker > pacemaker-1.0.8-6.el5 > > [r...@ha1 etc]# rpm -q nfs-utils > nfs-utils-1.0.9-44.el5 > > I have a problem because /etc/rc.d/init.d/nfs script exits 0 while nfsd > processes are still running.... baffed ;-) > > testing move of resource from one node to the other one. > > In messages > May 7 11:51:26 ha1 crmd: [21332]: info: te_rsc_command: Initiating action > 99: stop nfssrv_stop_0 on ha1 (local) > May 7 11:51:26 ha1 crmd: [21332]: info: do_lrm_rsc_op: Performing > key=99:19:0:21f0ede4-27ee-4d7c-90be-928ee1c062e2 op=nfssrv_stop_0 ) > May 7 11:51:26 ha1 lrmd: [21329]: info: rsc:nfssrv:47: stop > May 7 11:51:27 ha1 nfsserver[31830]: INFO: Stopping NFS server ... > May 7 11:51:27 ha1 mountd[27215]: Caught signal 15, un-registering and > exiting. > May 7 11:51:27 ha1 nfsserver[31830]: INFO: NFS server stopped > May 7 11:51:27 ha1 crmd: [21332]: info: process_lrm_event: LRM operation > nfssrv_stop_0 (call=47, rc=0, cib-update=138, confirmed=true) ok > May 7 11:51:27 ha1 crmd: [21332]: info: match_graph_event: Action > nfssrv_stop_0 (99) confirmed on ha1 (rc=0) > > So that I have this with crm_mon: > nfssrv (ocf::heartbeat:nfsserver): Stopped > > but of course failures trying to unmount then the underlying fs > In fact > [r...@ha1 etc]# ps -ef|grep nfs > root 2591 16988 0 12:10 pts/0 00:00:00 grep nfs > root 27196 7 0 11:27 ? 00:00:00 [nfsd4] > root 27197 1 0 11:27 ? 00:00:00 [nfsd] > root 27198 1 0 11:27 ? 00:00:00 [nfsd] > root 27199 1 0 11:27 ? 00:00:00 [nfsd] > root 27200 1 0 11:27 ? 00:00:00 [nfsd] > root 27201 1 0 11:27 ? 00:00:00 [nfsd] > root 27202 1 0 11:27 ? 00:00:00 [nfsd] > root 27203 1 0 11:27 ? 00:00:00 [nfsd] > root 27204 1 0 11:27 ? 00:00:00 [nfsd] > root 27205 1 0 11:27 ? 00:00:00 [nfsd] > root 27206 1 0 11:27 ? 00:00:00 [nfsd] > root 27207 1 0 11:27 ? 00:00:00 [nfsd] > root 27208 1 0 11:27 ? 00:00:00 [nfsd] > root 27209 1 0 11:27 ? 00:00:00 [nfsd] > root 27210 1 0 11:27 ? 00:00:00 [nfsd] > root 27211 1 0 11:27 ? 00:00:00 [nfsd] > root 27212 1 0 11:27 ? 00:00:00 [nfsd] > > If I try manually the script: > [r...@ha1 etc]# /etc/rc.d/init.d/nfs stop > Shutting down NFS mountd: [FAILED] > Shutting down NFS daemon: [ OK ] > Shutting down NFS quotas: [FAILED] > Shutting down NFS services: [FAILED] > > So the problem is related only to nfsd daemons running yet... > [r...@ha1 etc]# ps -ef|grep rpc > rpc 1509 1 0 Apr30 ? 00:00:00 portmap > root 2985 16988 0 12:13 pts/0 00:00:00 grep rpc > root 27164 7 0 11:27 ? 00:00:00 [rpciod/0] > root 27244 1 0 11:27 ? 00:00:00 rpc.idmapd > > Try to start "sh -x /etc/rc.d/init.d/nfs stop" > > + '[' -f /var/run/nfsd.pid ']' > + return 3 > + '[' -z '' -a -z '' ']' > ++ __pids_pidof nfsd > ++ pidof -c -o 3417 -o 3392 -o %PPID -x nfsd > + pid='27212 27211 27210 27209 27208 27207 27206 27205 27204 27203 27202 > 27201 27200 27199 27198 27197' > + '[' -n '27212 27211 27210 27209 27208 27207 27206 27205 27204 27203 27202 > 27201 27200 27199 27198 27197' ']' > + '[' color = verbose -a -z '' ']' > + '[' -z -2 ']' > + checkpid 27212 27211 27210 27209 27208 27207 27206 27205 27204 27203 27202 > 27201 27200 27199 27198 27197 > + local i > + for i in '$*' > + '[' -d /proc/27212 ']' > + return 0 > + kill -2 27212 27211 27210 27209 27208 27207 27206 27205 27204 27203 27202 > 27201 27200 27199 27198 27197 > + RC=0 > + '[' 0 -eq 0 ']' > + success 'nfsd -2' > > So the problem seems to be the > kill -2 > returning 0 without having killed anyone.... sort of Blank shots gun ;-) > I'm not expert with signals, but > /usr/src/kernels/2.6.18-194.el5-i686/include/asm/signal.h says > #define SIGINT 2 > > I found a three years old thread about this: > http://www.mail-archive.com/[email protected]/msg04706.html > > I used killnfsd in old heartbeat v1 clusters... what should I use now with > pacemaker?
Can you try to replace the INT signal with TERM (or KILL) in /etc/init.d/nfs. > Verified manually with: > [r...@ha1 etc]# kill -2 27212 27211 27210 27209 27208 27207 27206 27205 > 27204 27203 27202 27201 27200 27199 27198 27197 > [r...@ha1 etc]# echo $? > 0 > > and the nfsd processes are still there.... Try to wait for a while and see if they eventually do exit (perhaps after some timeout)? In that case the stop action of nfsserver should wait until /etc/init.d/nfs status reports stopped. Thanks, Dejan > What is the behaviour/signal of the other distros in stopping nfs server? > And what suggested for failover > If not already opened, I can open a bugzilla in rh el 5 too, if > requested.... > > BTW: what is the reason for nfsd4 process? > I want to bind to nfs v3 and in /etc/sysconfig/nfs I put: > MOUNTD_NFS_V1="no" > MOUNTD_NFS_V2="no" > RPCNFSDARGS="-N 4" > > It has been started by pid 7: > [r...@ha1 etc]# ps -fp 7 > UID PID PPID C STIME TTY TIME CMD > root 7 1 0 Apr30 ? 00:00:00 [kthread] > > Thanks in advance for your advises. > > Gianluca > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
