ok good, I am starting to understand, but I dont know anything about this 
stonith stuff (seriously, I've been learning HA  boot camp style for a couple 
days). Let me look into that.

Additionally, what would I need to do to allow the cluster to failover if eth0 
is detected to have gone down on one server? Is that another resource? 


 -------------- Original message ----------------------
From: Dominik Klein <[EMAIL PROTECTED]>
> Nick Duda wrote:
> > I rename the restart script for squid. 
> 
> Your OCF Script or your /etc/init.d script?
> 
> > My current setup (based on 
> > examples on the web) show that if squid fails on the current runing 
> > server it will try to restart itself. If restart fails it will failover. 
> > So basically I am trying to make a test case scenario that if the squid 
> > startup script in /etc/init.d got deleted 
> 
> Ah, your /etc/init.d script.
> 
> Okay, look at your OCF script, what it does when /etc/init.d/squid is 
> not there.
> 
> -----------
> INIT_SCRIPT=/etc/init.d/squid
> 
> case  "$1" in
>         start)
>                 ${INIT_SCRIPT} start > /dev/null 2>&1 && exit || exit 1
>         ;;
> 
>         stop)
>                 ${INIT_SCRIPT} stop > /dev/null 2>&1 && exit || exit 1
>         ;;
> 
>         status)
>                 ${INIT_SCRIPT} status > /dev/null 2>&1 && exit || exit 1
>         ;;
> 
>         monitor)
>                 # Check if Ressource is stopped
>                 ${INIT_SCRIPT} status > /dev/null 2>&1 || exit 7
> 
>                 # Otherwise check services (XXX: Maybe loosen retry / 
> timeout)
>                 wget -o /dev/null -O /dev/null -T 1 -t 1 
> http://localhost:3128/ && exit || exit 1
>         ;;
> 
>         meta-data)
> --------------
> 
> So for the next monitor operation, it will exec
> "${INIT_SCRIPT} status > /dev/null 2>&1 || exit 7"
> 
> This will propably return 7. So the cluster thinks your resource is 
> stopped. As it was running before (I guess?), the cluster will now try 
> to stop and start it.
> 
> Stop calls
> "stop > /dev/null 2>&1 && exit || exit 1"
> 
> This will return 1. So the stop operation failed.
> 
> With stonith, your node would be rebooted now. I don't see a stonith 
> device, so the resource goes "unmanaged".
> 
> I think what you see is intended.
> 
> Regards
> Dominik
> 
> > and squid crashed it should 
> > failover to the other box.....its not.
> > 
> > Dominik Klein wrote:
> >> Nick Duda wrote:
> >>> (sorry for the long email, but all my configs are here to view)
> >>>
> >>> I posted before about HA with 2 squid servers. It's just about done, 
> >>> but stumbling on something. Everytime i manually cause something to 
> >>> happen in hopes to see it failover, it doesnt. For example, I get 
> >>> crm_mon to show everything as I want it, and when I kill squid (and 
> >>> prevent the xml from restarting it) it just goes into a failed 
> >>> state...more below. Anyone see anything wrong with my configs?
> >>>
> >>> Server #1
> >>> Hostname: ha-1
> >>> eth0 - lan (192.168.95.1)
> >>> eth1 - xover to eth1 on other server
> >>>
> >>> Server #2
> >>> Hostname: ha-2
> >>> eth0 - lan (192.168.95.2)
> >>> eth1 - xover to eth1 on other server
> >>>
> >>> ha.cf on each server:
> >>>
> >>> bcast eth1
> >>> mcast eth0 239.0.0.2 694 1 0
> >>> node ha-1 ha-2
> >>> crm on
> >>>
> >>> Not using haresources because of crm
> >>>
> >>> Here is the output from crm_mon:
> >>>
> >>> ============
> >>> Last updated: Mon Apr 21 15:44:53 2008
> >>> Current DC: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d)
> >>> 2 Nodes configured.
> >>> 1 Resources configured.
> >>> ============
> >>>
> >>> Node: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d): online
> >>> Node: ha-2 (1691d699-2a81-4545-8242-b00862431514): online
> >>>
> >>> Resource Group: squid-cluster
> >>>    ip0 (heartbeat::ocf:IPaddr2):       Started ha-1
> >>>    squid       (heartbeat::ocf:squid): Started ha-1
> >>>
> >>> If squid stops on the current heartbeat serer, ha-1, it will restart 
> >>> within 60sec...so the scripting is working. If i stop the squid 
> >>> process and rename it in /etc/init.d/squid to something else, the 
> >>> script wont be able to execute the squid start and should failover to 
> >>> ha-2, but it doesnt, instead this appears (on both ha-1 and ha-2):
> >>
> >> What exactly do you "rename" and how? It's likely the cluster is 
> >> behaving sane and you're just creating a testcase you don't understand.
> >>
> >> Regards
> >> Dominik
> >>
> >>> ============
> >>> Last updated: Mon Apr 21 15:47:49 2008
> >>> Current DC: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d)
> >>> 2 Nodes configured.
> >>> 1 Resources configured.
> >>> ============
> >>>
> >>> Node: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d): online
> >>> Node: ha-2 (1691d699-2a81-4545-8242-b00862431514): online
> >>>
> >>> Resource Group: squid-cluster
> >>>    ip0 (heartbeat::ocf:IPaddr2):       Started ha-1
> >>>    squid       (heartbeat::ocf:squid): Started ha-1 (unmanaged) FAILED
> >>>
> >>> Failed actions:
> >>>    squid_stop_0 (node=ha-1, call=74, rc=1): Error
> >> _______________________________________________
> >> Linux-HA mailing list
> >> [email protected]
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> See also: http://linux-ha.org/ReportingProblems
> >>
> > 
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> > 
> 
> 
> -- 
> 
> IN-telegence GmbH & Co. KG
> Oskar-Jäger-Str. 125
> 50825 Köln
> 
> Registergericht Köln - HRA 14064, USt-ID Nr. DE 194 156 373
> ph Gesellschafter: komware Unternehmensverwaltungsgesellschaft mbH,
> Registergericht Köln - HRB 38396
> Geschäftsführende Gesellschafter: Christian Plätke und Holger Jansen
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to