Re: [Linux-HA] Problem with connectivity loss

Chase Simms Mon, 08 Sep 2008 14:12:20 -0700

Thank you Laurent.  I had been fighting this for weeks.  I'm new to
heartbeat and was about the throw in the towel.  My config worked fine
once I blanked it and read it back in using cibadmin and left off the
notification module.  Having your config was very helpful since I'm
doing everything almost identically to how you are doing it.


Thanks,
Chase

>>> "Laurent Yin" <[EMAIL PROTECTED]> 9/4/2008 4:50 PM >>>
Hi Chase.

I use Ubuntu Server 8.04

Here's my ha.cf

############
###### HA.CF
############

#/etc/ha.d/ha.cf
#
bcast eth1

baud    19200
### UNCOMMENT AND PUT THE OTHER NODE'S IP HERE ###
#ucast eth0 PEER_IP
#####


debugfile /var/log/ha.debug
logfile    /var/log/ha.log
logfacility    local0
crm yes
#time between heart beats
keepalive    5

deadtime    15

warntime    6

initdead    20

#Name must be the one returned by 'uname -n'
node    machine1
node    machine2

### UNCOMMENT AND PUT THE GATEWAY IP TO DETECT CONNECTIVITY LOSS
#ping    GATEWAY_IP
#####

# to detect connectivity loss
respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd


# no auto failback. I'm not sure it does anything on v2, you'll have to
set
resource stickiness
auto_failback off

############
###### HA.CF
############


  And here are the files I use to configure my cib (I then use
"cibadmin -o
resources -C -x file_name" to add them or "-o constraints")

#################
###### DRBDRESOURCE
#################

<master_slave id="ms-drbd0">
  <meta_attributes id="ma-ms-drbd0">
     <attributes>
       <nvpair id="ma-ms-drbd0-1" name="clone_max" value="2"/>
       <nvpair id="ma-ms-drbd0-2" name="clone_node_max" value="1"/>
       <nvpair id="ma-ms-drbd0-3" name="master_max" value="1"/>
       <nvpair id="ma-ms-drbd0-4" name="master_node_max" value="1"/>
       <nvpair id="ma-ms-drbd0-5" name="notify" value="yes"/>
       <nvpair id="ma-ms-drbd0-6" name="globally_unique"
value="false"/>
       <nvpair id="ma-ms-drbd0-7" name="target_role"
value="#default"/>
    </attributes>
  </meta_attributes>
  <primitive id="drbd0" class="ocf" provider="heartbeat" type="drbd">
    <instance_attributes id="ia-drbd0">
      <attributes>
        <nvpair id="ia-drbd0-1" name="drbd_resource" value="mysql"/>
      </attributes>
    </instance_attributes>
     <operations>
       <op id="op-drbd0-1" name="monitor" interval="20s" timeout="10s"
role="Master"/>
       <op id="op-drbd0-2" name="monitor" interval="45s" timeout="10s"
role="Slave"/>
     </operations>
  </primitive>
</master_slave>

#################
###### DRBDRESOURCE
#################


#################
###### MYSQLGROUP
#################

<group id="mysqlgroup">
 <meta_attributes id="ma-mysqlgroup">
  <attributes>
   <nvpair name="resource_stickiness" id="ma-mysqlgroup-1"
value="9999"/>
   <nvpair id="ma-mysqlgroup-2" name="target_role" value="started"/>
  </attributes>
 </meta_attributes>
 <primitive class="ocf" provider="heartbeat" type="Filesystem"
id="fs0">
  <instance_attributes id="ia-fs0">
   <attributes>
    <nvpair id="ia-fs0-1" name="fstype" value="ext3"/>
    <nvpair id="ia-fs0-2" name="directory" value="/replicated"/>
    <nvpair id="ia-fs0-3" name="device" value="/dev/drbd0"/>
   </attributes>
  </instance_attributes>
  <operations>
   <op id="op-filesystem-monitor" interval="20s" name="monitor"
timeout="10s"/>
  </operations>
</primitive>
<primitive class="lsb" type="mysql" id="mysqlserver">
 <operations>
  <op id="op-monitor-mysql" name="monitor" interval="10s"
timeout="5s"/>
 </operations>
</primitive>
<primitive id="mysql-vip" class="ocf" type="IPaddr2"
provider="heartbeat">
 <instance_attributes id="ia-mysql-vip">
  <attributes>
   <nvpair id="ia-mysql-vip-ip" name="ip" value="192.168.21.10"/>
   <nvpair id="ia-virtual-ip-2" name="broadcast"
value="192.168.203.255"/>
   <nvpair id="ia-mysql-vip-nic" name="nic" value="eth0"/>
   <nvpair id="ia-mysql-vip-netmask" name="cidr_netmask" value="
255.255.255.0"/>
  </attributes>
 </instance_attributes>
 <operations>
  <op id="op-monitor-vip" name="monitor" interval="10s" timeout="3s"/>
 </operations>
</primitive>
<primitive id="R_MailTo" class="ocf" type="MailTo"
provider="heartbeat">
 <instance_attributes>
  <attributes>
   <nvpair id="44b0bd1a-3795-4a20-aaab-58df706bc39b" name="email"
value="
[EMAIL PROTECTED]"/>
   <nvpair id="4d763860-6a5d-425b-8a13-986f4ede82dc" name="subject"
value="My_Mysql_Server"/>
  </attributes>
 </instance_attributes>
</primitive>
</group>

#################
###### MYSQLGROUP
#################


#################
###### COLOC_CONST
#################

<!-- Mount file system only on node which is master -->
<rsc_colocation id="start_mysql_on_drbd_master" to="ms-drbd0"
to_role="master" from="mysqlgroup" score="INFINITY"/>

#################
###### COLOC_CONST
#################


#################
###### ORDER_CONST
#################

<rsc_order id="drbd_before_mysql_group" from="mysqlgroup"
action="start"
type="after" to="ms-drbd0" to_action="promote"/>

#################
###### ORDER_CONST
#################


#################
# CONNECTIVITY_CONST
#################

<rsc_location id="my_resource:connected" rsc="mysqlgroup">
  <rule id="my_resource:connected:rule" score="-INFINITY"
boolean_op="or">
    <expression id="my_resource:connected:expr:undefined"
      attribute="pingd" operation="not_defined"/>
    <expression id="my_resource:connected:expr:zero"
      attribute="pingd" operation="lte" value="0"/>
  </rule>
</rsc_location>

#################
# CONNECTIVITY_CONST
#################


Putting the ping gatewayIP and respawn... in your ha.cf and adding the
connectivity_const constraint to your cib.xml should do the trick. I
just
followed the tutorial in the link in my first post step by step and it
worked.
I used to ping www.google.com but there were warnings in ha.log or
ha.debug,
so I just used the gateway IP because it's their only way to reach the
outside world in any case.

I managed to have a 35 seconds failover time when unplugging the
cable.

I don't know what version of heartbeat I'm using, I'll check that
tomorrow
and tell you about it.

Laurent

On Thu, Sep 4, 2008 at 5:21 PM, Chase Simms <[EMAIL PROTECTED]>
wrote:

> Laurent,
>
> Would you mind sharing your ha.cf and your cib.xml.  I've been
fighting
> the same problem for weeks.  I was about to give up when I found your
post.
>  Everything works for me except network failover.  I've tried running
using
> a constraint to run Pingd with MySQL and used the clone method from
the
> tutorials.  I would love to see a config I know works.  What flavor
of Linux
> are you using?  I'm using CentOS and the heartbeat from their
repositories.
>
> Thank you,
> Chase
>
> >>> "Laurent Yin" <[EMAIL PROTECTED]> 8/27/2008 10:22 AM >>>
> thanks!
> I will try the "on fail ignore". The other issue was not really an
issue
> because it finally worked fine when I decided to completely erase the
CIB
> and reconfigure constraints and resources without adding the mail
resource.
> Maybe it was a problem due to the fact that I used "crm_resource" to
remove
> this one resource specifically, I don't know...
>
> I had "solved" the mail program by changing the MailTo RA, launching
the
> mail in another process to not have to wait for the timeout to
arrive, and
> masking the error.
> The advantage is that I don't have to wait the timeout - which was
quite
> long if I remember well - to continue leaving up resources, allowing
> failover to execute faster.
>
> The inconvenient is that I have to change the MailTo RA...
>
> Is there any  way to emulate this behaviour by setting fail_ignore?
>
> On Mon, Aug 25, 2008 at 12:21 PM, Andrew Beekhof <[EMAIL PROTECTED]>
> wrote:
>
> > On Tue, Aug 12, 2008 at 12:30, Laurent Yin
<[EMAIL PROTECTED]>
> > wrote:
> > > Hello,
> > >
> > > I set up a DRBD-Mysql cluster with a master slave set DRBD and a
mysql
> > > resource group containing :
> > > -a Filesystem
> > > -a mysql (5.1)
> > > -a virtual IP Address (IPAddr2)
> > > -a MailTo RA
> > >
> > > I have two constraints :
> > > - one colocational constraint which tells that you have to have
DRBD
> > master
> > > on the machine running mysqlgroup
> > > - one ordering constraint which tells you have to launch
mysqlgroup
> after
> > > DRBD
> > >
> > > It works fine and it does failover smoothly on machine poweroff
and
> > stuffs.
> > >
> > > Now I would've liked it to be network-loss tolerant, eg if I
unplug the
> > > network cable between the master node and the router, I want it
to
> detect
> > > that connectivity is lost.
> > > For that purpose, I added two ping nodes to my ha.cf and a
respawn
> with
> > > pingd.
> > >
> > > ## in HA.CF
> > > ping    www.google.com 
> > > ping    www.yahoo.com 
> > >
> > > respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd
> > > ## END OF in HA.CF
> > >
> > > I also added a constraint as done on the site
> > > http://www.linux-ha.org/pingdin the section "Only Run my_resource
on
> > > Nodes With Access to at Least One
> > > Ping Node".
> > >
> > > ## CONSTRAINT ##
> > > <rsc_location id="my_resource:connected" rsc="mysqlgroup">
> > >  <rule id="my_resource:connected:rule" score="-INFINITY"
> boolean_op="or">
> > >    <expression id="my_resource:connected:expr:undefined"
> > >      attribute="pingd" operation="not_defined"/>
> > >    <expression id="my_resource:connected:expr:zero"
> > >      attribute="pingd" operation="lte" value="0"/>
> > >  </rule>
> > > </rsc_location>
> > > ## END OF CONSTRAINT ##
> > >
> > >
> > > I have two problems with this configuration.
> > > 1 ) When I unplug the network cable of the machine running mysql,
after
> > > detecting that there is no connectivity, it tries to stop the
group,
> > > beginning with my last resource which is MailTo. But, as there is
no
> > > connectivity, it fails to stop, and therefore the whole group
remains
> > > unstopped. What can I do against this?
> >
> > fix the RA or set on_fail=ignore for the resource's stop action
> >
> > >
> > > 2 ) When I remove the MailTo RA (just for testing purpose, to see
what
> > > happens, but this is not an acceptable solution), it manages to
stop
> the
> > > mysqlgroup, but it doesn't get started on the other node. I
assume that
> > it
> > > is because DRBD is still master on this node. How can I tell
Heartbeat
> to
> > > switch master/slave in DRBD when connectivity is lost?
> > > Or is there another solution with constraints maybe?
> >
> > create a similar pingd constraint for drbd as you used for the
group
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected] 
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> > See also: http://linux-ha.org/ReportingProblems 
> >
>
>
>
> --
> This is the end ... beautiful friend ...
>
> This is the end .... my only friend, the end ...
> _______________________________________________
> Linux-HA mailing list
> [email protected] 
> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> See also: http://linux-ha.org/ReportingProblems 
>
>
> The information in this email is intended for the sole use of the
> addressees and may be confidential and subject to protection under
the law.
> If you are not the intended recipient, you are hereby notified that
any
> distribution or copying of this email is strictly prohibited. If you
have
> received this message in error, please reply and delete your copy.
>
> _______________________________________________
> Linux-HA mailing list
> [email protected] 
> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> See also: http://linux-ha.org/ReportingProblems 
>



-- 
This is the end ... beautiful friend ...

This is the end .... my only friend, the end ...
_______________________________________________
Linux-HA mailing list
[email protected] 
http://lists.linux-ha.org/mailman/listinfo/linux-ha 
See also: http://linux-ha.org/ReportingProblems 


The information in this email is intended for the sole use of the
addressees and may be confidential and subject to protection under the
law. If you are not the intended recipient, you are hereby notified that
any distribution or copying of this email is strictly prohibited. If you
have received this message in error, please reply and delete your copy.

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Problem with connectivity loss

Reply via email to