Sun Jiang Dong wrote:
Maybe I'm wrong. Anyway, please take a look at my points.< It's strange that I sent the mail but it didnot appear on the list> -------- Original Message --------Subject: Re: [Linux-ha-dev] Re: [Linux-ha-cvs] Linux-HA CVS: lib by sunjd fromDate: Thu, 01 Dec 2005 23:42:14 +0800 From: Sun Jiang Dong <[EMAIL PROTECTED]>To: High-Availability Linux Development List <[email protected]> References: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>Alan Robertson wrote:I'm not sure what's your meaning, especially what "the parameters" refers to.Guochun Shi wrote:excellent, this fixed bug 952 -Guochun [email protected] wrote:linux-ha CVS committal Author : sunjd Host : Project : linux-ha Module : lib Dir : linux-ha/lib/plugins/stonith/external Modified Files: ssh.in Log Message: bug952: avoid to freeze for a long time ===================================================================RCS file: /home/cvs/linux-ha/linux-ha/lib/plugins/stonith/external/ssh.in,vretrieving revision 1.7 retrieving revision 1.8 diff -u -3 -r1.7 -r1.8 --- ssh.in 17 Nov 2005 05:32:31 -0000 1.7 +++ ssh.in 30 Nov 2005 02:28:37 -0000 1.8 @@ -41,7 +41,7 @@ for j in 1 2 3 do if - ping -w0.5 -c1 "$1" >/dev/null 2>&1 + ping -w1 -c1 "$1" >/dev/null 2>&1 then return 1 fi @@ -97,7 +97,7 @@ for h in $hostlist do if - host $h 2>&1 | grep "not found:" + ping -w1 -c1 "$h" 2>&1 | grep "unknown host"But, this change is in error.You should not fail to verify the parameters just because the host is down. That's what the code originally did - and it's wrong. Ping isAs my understanding, this line is to detect if the target node name can be resolved. Here I just make a equal substitution.
No, it is NOT an equal substitution. It will fail if the node can't be resolved (which is good), and it will also fail if the node isn't up at the moment (which is bad).
This is just for testing if the configuration is correct. On other stonith devices it just tests to see if the stonith device is correctly configured, and the device (which is separate from the host) is working. Since we don't have a stonith device in this case, all we can do and all we SHOULD do is to fail if the configuration is bad.
It is NOT for doing a stonith or seeing if the host is up or anything equivalent. It is REQUIRED to succeed if the configuration is correct but the host is down.
You should know this, because your stonithd code is the code that's used to monitor it. If the monitor fails (which is what happens if you do this the way you did it), then the stonith device is treated as having failed. But, the host being down does not mean the stonith device has failed, and we shouldn't re-create the stonith device.
Anyway, I think it's better to try a ssh login, but this is a little difficult to control the timeout when thehost is down.
BUT, this would make it worse yet. Ping will check if the
the wrong command to issue in this case. On the other hand, this is probably exactly what caused the problem in my site - since they aren't in DNS, but only in /etc/hosts.Normally it depends on the configuration in /etc/host.conf, and with defaultconfig, using ping can handle the situation.So, the right thing to do is write a function - maybe called checkhost() which looks in /etc/hosts, and if it's not there, then use host as it originally did.
--
Alan Robertson <[EMAIL PROTECTED]>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
