Hi Markus,
If I recall you need to reset the failed resource after you manually
clean it up. Your monitor failed, couldn't restart it, heartbeat determines
there is a problem with this resource on this node that it cannot do
anything to fix. You have to manually fix it and then reset the failed
resource.
>From the http://www.linux-ha.org/v2/AdminTools/crm_resource page
12. Resetting a failed resource after having been manually cleaned up
crm_resource -C -H c001n02 -r my_frist_ip
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Markus W.
> Sent: May 15, 2007 6:39 AM
> To: General Linux-HA mailing list
> Subject: Re: [Linux-HA] MySQL Master Master
>
> Starting from scratch:
>
> OS: Linux, RHEL 4, Kernel 2.6.9-55.EL
> HA: RPM Installation from
> http://dev.centos.org/centos/4/testing/i386/RPMS/heartbeat-2.0
> .8-2.el4.centos
>
> Configuration, Log, Cibadmin Files: see Attachements
> - xx.xx.xx.xx: Ping IP
> - yy.yy.yy.yy: Cluster IP
> - zz.zz.zz.zz: Cluster Broadcast
>
> Problem:
> Edit /etc/init.d/httpd on node 1 to return 1 on startup
> (simulate error) Stop httpd on node1 => heartbeat try to
> restart httpd on node 1 (fail) => httpd or the group Web
> failover to node2 Reset /etc/init.d/httpd on node 1 to normal
> behaviour Start httpd on node1 Set node2 in standby mode =>
> heartbeat wont failback to node1
>
> Best regards,
>
> Markus
>
> Dejan Muhamedagic schrieb:
> > On Mon, May 14, 2007 at 11:01:16AM +0200, Markus W. wrote:
> >
> >> Ok, I understand nothing. I have the same problem with apache like
> >> mysql. If apache run on the first node and something goes wrong on
> >> that node apache would switch to the second node - great.
> Apache wont
> >> failback to the first node if I repair the first node and
> switch the
> >> second node into standby mode - bad.
> >>
> >
> > With default settings, resources should move back to their
> preferred
> > node, once that one is live again. You'd probably want to post the
> > logs and the configuration. See
> http://linux-ha.org/ReportingProblems
> >
> >
> >> Anywhere on the ha "universe" page there was an
> information about this.
> >> But I don't understand why heartbeat dont try the first node just
> >> once again. If the first node is ok why heartbeat shouln'd
> move the
> >> resource back to the first node? Ok, if the first node
> isn't ok again
> >> I would understand heartbeat is given up to run the
> resource anywhere.
> >>
> >> Thanks
> >>
> >> Benjamin Lawetz schrieb:
> >>
> >>> I have a vague impression that you might run into
> problems with the
> >>> dummy mysql script. From memory (and one of the gurus here will
> >>> correct me if I'm
> >>> wrong) heartbeat can call "status" on startup or certain
> occasions.
> >>>
> >>> So having status return an "All OK" when the ressource
> agent should
> >>> not be running might cause unexpected behaviour. You
> might need to
> >>> implement a dummy start and stop and status of just touching or
> >>> deleting a fake pid file and returning the status
> according to this.
> >>> You could then implement your function in the monitor part of the
> >>> script.
> >>>
> >>> But I may be wrong.
> >>>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: [EMAIL PROTECTED]
> >>>> [mailto:[EMAIL PROTECTED] On Behalf
> Of Markus W.
> >>>> Sent: May 11, 2007 7:17 AM
> >>>> To: General Linux-HA mailing list
> >>>> Subject: Re: [Linux-HA] MySQL Master Master
> >>>>
> >>>> Hi Benjamin,
> >>>>
> >>>> Wow! It rocks!! Thanks!!!
> >>>>
> >>>> As information the lsb dummy mysql ha script:
> >>>>
> >>>> case "$1" in
> >>>> start)
> >>>> exit 0
> >>>> ;;
> >>>> stop)
> >>>> exit 0
> >>>> ;;
> >>>> status)
> >>>> status mysqld
> >>>> if [ $? -eq "0" ]; then
> >>>> /usr/sbin/attrd_updater -n mysql_running -d 3s -v 1
> >>>> exit 0
> >>>> else
> >>>> /usr/sbin/attrd_updater -n mysql_running -d 3s -v 0
> >>>> exit 3
> >>>> fi
> >>>> ;;
> >>>> *)
> >>>> echo $"Usage: $0 {start|stop|status} (start|stop faked)"
> >>>> exit 1
> >>>> esac
> >>>>
> >>>> --
> >>>> Markus
> >>>>
> >>>>
> >>>> Benjamin Lawetz schrieb:
> >>>>
> >>>>
> >>>>> Hi Markus,
> >>>>>
> >>>>> I ran into the same problem. Didn't find any
> better way than to
> >>>>> modify the monitoring script of mysql and add in the case
> >>>>>
> >>>>>
> >>>> of a failure:
> >>>>
> >>>>
> >>>>> /usr/sbin/attrd_updater -n mysql_running -d 5s -v 0
> >>>>>
> >>>>> And in the case of a success:
> >>>>>
> >>>>> /usr/sbin/attrd_updater -n mysql-mod_running -d 5s -v 1
> >>>>>
> >>>>> The running the monitor script as a clone:
> >>>>>
> >>>>> <clone id="mysql">
> >>>>> <instance_attributes id="mysql">
> >>>>> <attributes>
> >>>>> <nvpair id="mysql-clone_node_max"
> name="clone_node_max"
> >>>>> value="1"/>
> >>>>> </attributes>
> >>>>> </instance_attributes>
> >>>>> <primitive id="mysql-child" provider="heartbeat"
> >>>>>
> >>>>>
> >>>> class="ocf"
> >>>>
> >>>>
> >>>>> type="mysql">
> >>>>> <operations>
> >>>>> <op id="mysql-child-monitor" name="monitor"
> >>>>>
> >>>>>
> >>>> interval="20s"
> >>>>
> >>>>
> >>>>> timeout="40s" prereq="nothing">
> >>>>> <instance_attributes
> id="mysql-child-monitor-attr">
> >>>>> </instance_attributes>
> >>>>> </op>
> >>>>> <op id="mysql-child-start" name="start"
> >>>>>
> >>>>>
> >>>> prereq="nothing"/>
> >>>>
> >>>>
> >>>>> </operations>
> >>>>> </primitive>
> >>>>> </clone>
> >>>>>
> >>>>> And then had a constraint:
> >>>>>
> >>>>> <rsc_location rsc="group_1" id="cli-stop2-group_1">
> >>>>> <rule score="-INFINITY" id="cli-stop2-rule-group_1">
> >>>>> <expression operation="lte" value="0"
> >>>>>
> >>>>>
> >>>> id="cli-stop2-expr-group_1"
> >>>>
> >>>>
> >>>>> attribute="mysql_running"/>
> >>>>> </rule>
> >>>>> </rsc_location>
> >>>>>
> >>>>> This will run the monitor on every node and set the score
> >>>>>
> >>>>>
> >>>> to -INFINITY
> >>>>
> >>>>
> >>>>> for the node where mysql fails.
> >>>>>
> >>>>> If mysql comes back online though, the "mysql_running" will
> >>>>>
> >>>>>
> >>>> be set to "1"
> >>>>
> >>>>
> >>>>> but I don't think it will trigger a recalculation of
> the scores.
> >>>>> Haven't figured out yet how to cause this.
> >>>>>
> >>>>>
> >>>>> Hope this helps
> >>>>>
> >>>>>
> >>>>>
> >>>> _______________________________________________
> >>>> Linux-HA mailing list
> >>>> [email protected]
> >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>> See also: http://linux-ha.org/ReportingProblems
> >>>>
> >>>>
> >>> _______________________________________________
> >>> Linux-HA mailing list
> >>> [email protected]
> >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>> See also: http://linux-ha.org/ReportingProblems
> >>>
> >>>
> >>>
> >> _______________________________________________
> >> Linux-HA mailing list
> >> [email protected]
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> See also: http://linux-ha.org/ReportingProblems
> >>
> >
> >
>
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems