If you lose power to the server (ie the victim node does not have power) then 
it is effectively fenced and cannot cause I/O data corruption. More than one 
fencing method will increase the effectiveness of cluster fencing but only in 
so far as the errant machine can be prevented from performing I/O to the data. 
Sadly, whoever designed your cluster is clearly not experienced enough to be 
designing HA / Cluster solutions. You really do need to convince the sponsors 
of your project that attempting to fence a server via ssh / iptables is not the 
way to go and a more robust solution is called for. 
  ----- Original Message ----- 
  From: Ian Hayes 
  To: linux clustering 
  Sent: Monday, April 13, 2009 5:19 PM
  Subject: Re: [Linux-cluster] Fenced failing continuously


  I realize that the ssh option is not optimal, but I'm stuck with the design 
requirements. I'm hoping I can get them changed.

  But, this got me thinking... conventional fencing is not failsafe. I can 
think of quite a number of less than optimal but entirely real-world situations 
where a node can die and not be able to be absolutely fenced off. iLO only 
works of the victim node still has power. I've only been in 1 shop that had the 
APC managed power, and they didn't even have that set up. Brocade fencing 
doesn't always apply, especially if you're just doing a virtual IP. So 
sometimes having a second fencing method as a backup may not always be feasible.

  So even with more traditional fences, this may not work unless I start 
modding fence scripts to return a success code even if they fail.


  On Fri, Apr 10, 2009 at 2:36 AM, Virginian <[email protected]> wrote:

    Hi Ian,

    I think there is a flaw in the design. For example, say the network card 
fails on machine A. Machine B detects this and tries to fence machine A. The 
problem with doing it via ssh to modify iptables is that there is no network 
connectivity to Machine A and hence this mechanism will never work. What you 
need is a solution that works independently of the OS such as a power switch or 
remote management interface such as IBM RSA II, HP iLO etc. With fencing, the 
solution has to be absolute and ruthless in that, in this example, machine B 
needs to be able to fence Machine A absolutely every time there is a problem 
and as soon as there is a problem.

    Regards

    John


      ----- Original Message ----- 
      From: Ian Hayes 
      To: [email protected] 
      Sent: Friday, April 10, 2009 1:07 AM
      Subject: [Linux-cluster] Fenced failing continuously


      I've been testing a newly built 2-node cluster. The cluster resources are 
a virtual IP and squid, so in a node failure, the VIP would go to the surviving 
node and start up Squid. I'm running a modified fencing agent that will SSH 
into the failing node and firewall it off via IPtables (not my choice).

      This all works fine for graceful shutdowns, but when I do something nasty 
like pulling the power cord on the node that is currently running the service, 
the surviving node never assumes the service and spends all its time trying to 
fire off the fence agent, which obviously will not work because the server is 
completely offline. The only way I can get the surviving node to assume the VIP 
and start Squid is to fence_ack_manual, which sort of runs counter to running a 
cluster to begin with. The logs are filled with 

      Apr 12 00:01:44 <hostname> fenced[3223]: fencing node "<otherhost>"
       Could not disable xx.xx.xx.xx on    23]: agent "fence_iptables" reports: 
ssh: connect to host xx.xx.xx.xx port 22: No route to host

      Is this a misconfiguration, or is there an option I can include somewhere 
to tell the nodes to give it up after a certain number of tries?



--------------------------------------------------------------------------


      --
      Linux-cluster mailing list
      [email protected]
      https://www.redhat.com/mailman/listinfo/linux-cluster

    --
    Linux-cluster mailing list
    [email protected]
    https://www.redhat.com/mailman/listinfo/linux-cluster





------------------------------------------------------------------------------


  --
  Linux-cluster mailing list
  [email protected]
  https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to