Hi Markus,

        If I recall you need to reset the failed resource after you manually
clean it up. Your monitor failed, couldn't restart it, heartbeat determines
there is a problem with this resource on this node that it cannot do
anything to fix. You have to manually fix it and then reset the failed
resource.

>From the http://www.linux-ha.org/v2/AdminTools/crm_resource page

12. Resetting a failed resource after having been manually cleaned up
        crm_resource -C -H c001n02 -r my_frist_ip


> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Markus W.
> Sent: May 15, 2007 6:39 AM
> To: General Linux-HA mailing list
> Subject: Re: [Linux-HA] MySQL Master Master
> 
> Starting from scratch:
> 
> OS: Linux, RHEL 4, Kernel 2.6.9-55.EL
> HA: RPM Installation from
> http://dev.centos.org/centos/4/testing/i386/RPMS/heartbeat-2.0
> .8-2.el4.centos 
> 
> Configuration, Log, Cibadmin Files: see Attachements
>  - xx.xx.xx.xx: Ping IP
>  - yy.yy.yy.yy: Cluster IP
>  - zz.zz.zz.zz: Cluster Broadcast
> 
> Problem:
> Edit /etc/init.d/httpd on node 1 to return 1 on startup 
> (simulate error) Stop httpd on node1 => heartbeat try to 
> restart httpd on node 1 (fail)  => httpd or the group Web 
> failover to node2 Reset /etc/init.d/httpd on node 1 to normal 
> behaviour Start httpd on node1 Set node2 in standby mode => 
> heartbeat wont failback to node1
> 
> Best regards,
> 
> Markus
> 
> Dejan Muhamedagic schrieb:
> > On Mon, May 14, 2007 at 11:01:16AM +0200, Markus W. wrote:
> >   
> >> Ok, I understand nothing. I have the same problem with apache like 
> >> mysql. If apache run on the first node and something goes wrong on 
> >> that node apache would switch to the second node - great. 
> Apache wont 
> >> failback to the first node if I repair the first node and 
> switch the 
> >> second node into standby mode - bad.
> >>     
> >
> > With default settings, resources should move back to their 
> preferred 
> > node, once that one is live again. You'd probably want to post the 
> > logs and the configuration. See 
> http://linux-ha.org/ReportingProblems
> >
> >   
> >> Anywhere on the ha "universe" page there was an 
> information about this. 
> >> But I don't understand why heartbeat dont try the first node just 
> >> once again. If the first node is ok why heartbeat shouln'd 
> move the 
> >> resource back to the first node? Ok, if the first node 
> isn't ok again 
> >> I would understand heartbeat is given up to run the 
> resource anywhere.
> >>
> >> Thanks
> >>
> >> Benjamin Lawetz schrieb:
> >>     
> >>> I have a vague impression that you might run into 
> problems with the 
> >>> dummy mysql script. From memory (and one of the gurus here will 
> >>> correct me if I'm
> >>> wrong) heartbeat can call "status" on startup or certain 
> occasions.
> >>>
> >>> So having status return an "All OK" when the ressource 
> agent should 
> >>> not be running might cause unexpected behaviour. You 
> might need to 
> >>> implement a dummy start and stop and status of just touching or 
> >>> deleting a fake pid file and returning the status 
> according to this. 
> >>> You could then implement your function in the monitor part of the 
> >>> script.
> >>>
> >>> But I may be wrong.
> >>>
> >>>  
> >>>       
> >>>> -----Original Message-----
> >>>> From: [EMAIL PROTECTED]
> >>>> [mailto:[EMAIL PROTECTED] On Behalf 
> Of Markus W.
> >>>> Sent: May 11, 2007 7:17 AM
> >>>> To: General Linux-HA mailing list
> >>>> Subject: Re: [Linux-HA] MySQL Master Master
> >>>>
> >>>> Hi Benjamin,
> >>>>
> >>>> Wow! It rocks!! Thanks!!!
> >>>>
> >>>> As information the lsb dummy mysql ha script:
> >>>>
> >>>> case "$1" in
> >>>>  start)
> >>>>    exit 0
> >>>>    ;;
> >>>>  stop)
> >>>>    exit 0
> >>>>    ;;
> >>>>  status)
> >>>>    status mysqld
> >>>>    if [ $? -eq "0" ]; then
> >>>>        /usr/sbin/attrd_updater -n mysql_running -d 3s -v 1
> >>>>        exit 0
> >>>>    else
> >>>>        /usr/sbin/attrd_updater -n mysql_running -d 3s -v 0
> >>>>        exit 3
> >>>>    fi
> >>>>    ;;
> >>>>  *)
> >>>>    echo $"Usage: $0 {start|stop|status} (start|stop faked)"
> >>>>    exit 1
> >>>> esac
> >>>>
> >>>> --
> >>>> Markus
> >>>>
> >>>>
> >>>> Benjamin Lawetz schrieb:
> >>>>    
> >>>>         
> >>>>> Hi Markus,
> >>>>>
> >>>>>         I ran into the same problem. Didn't find any 
> better way than to 
> >>>>> modify the monitoring script of mysql and add in the case
> >>>>>      
> >>>>>           
> >>>> of a failure:
> >>>>    
> >>>>         
> >>>>> /usr/sbin/attrd_updater -n mysql_running -d 5s -v 0
> >>>>>
> >>>>> And in the case of a success:
> >>>>>
> >>>>> /usr/sbin/attrd_updater -n mysql-mod_running -d 5s -v 1
> >>>>>
> >>>>> The running the monitor script as a clone:
> >>>>>
> >>>>>       <clone id="mysql">
> >>>>>         <instance_attributes id="mysql">
> >>>>>           <attributes>
> >>>>>             <nvpair id="mysql-clone_node_max" 
> name="clone_node_max"
> >>>>> value="1"/>
> >>>>>           </attributes>
> >>>>>         </instance_attributes>
> >>>>>         <primitive id="mysql-child" provider="heartbeat" 
> >>>>>      
> >>>>>           
> >>>> class="ocf"
> >>>>    
> >>>>         
> >>>>> type="mysql">
> >>>>>           <operations>
> >>>>>             <op id="mysql-child-monitor" name="monitor" 
> >>>>>      
> >>>>>           
> >>>> interval="20s"
> >>>>    
> >>>>         
> >>>>> timeout="40s" prereq="nothing">
> >>>>>               <instance_attributes 
> id="mysql-child-monitor-attr">
> >>>>>               </instance_attributes>
> >>>>>             </op>
> >>>>>             <op id="mysql-child-start" name="start" 
> >>>>>      
> >>>>>           
> >>>> prereq="nothing"/>
> >>>>    
> >>>>         
> >>>>>           </operations>
> >>>>>         </primitive>
> >>>>>       </clone>
> >>>>>
> >>>>> And then had a constraint:
> >>>>>
> >>>>>       <rsc_location rsc="group_1" id="cli-stop2-group_1">
> >>>>>         <rule score="-INFINITY" id="cli-stop2-rule-group_1">
> >>>>>           <expression operation="lte" value="0" 
> >>>>>      
> >>>>>           
> >>>> id="cli-stop2-expr-group_1"
> >>>>    
> >>>>         
> >>>>> attribute="mysql_running"/>
> >>>>>         </rule>
> >>>>>       </rsc_location>
> >>>>>
> >>>>> This will run the monitor on every node and set the score
> >>>>>      
> >>>>>           
> >>>> to -INFINITY
> >>>>    
> >>>>         
> >>>>> for the node where mysql fails.
> >>>>>
> >>>>> If mysql comes back online though, the "mysql_running" will
> >>>>>      
> >>>>>           
> >>>> be set to "1"
> >>>>    
> >>>>         
> >>>>> but I don't think it will trigger a recalculation of 
> the scores. 
> >>>>> Haven't figured out yet how to cause this.
> >>>>>
> >>>>>
> >>>>> Hope this helps
> >>>>>  
> >>>>>      
> >>>>>           
> >>>> _______________________________________________
> >>>> Linux-HA mailing list
> >>>> [email protected]
> >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>> See also: http://linux-ha.org/ReportingProblems
> >>>>    
> >>>>         
> >>> _______________________________________________
> >>> Linux-HA mailing list
> >>> [email protected]
> >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>> See also: http://linux-ha.org/ReportingProblems
> >>>
> >>>  
> >>>       
> >> _______________________________________________
> >> Linux-HA mailing list
> >> [email protected]
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> See also: http://linux-ha.org/ReportingProblems
> >>     
> >
> >   
> 
> 

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to