ok good, I am starting to understand, but I dont know anything about this
stonith stuff (seriously, I've been learning HA boot camp style for a couple
days). Let me look into that.
Additionally, what would I need to do to allow the cluster to failover if eth0
is detected to have gone down on one server? Is that another resource?
-------------- Original message ----------------------
From: Dominik Klein <[EMAIL PROTECTED]>
> Nick Duda wrote:
> > I rename the restart script for squid.
>
> Your OCF Script or your /etc/init.d script?
>
> > My current setup (based on
> > examples on the web) show that if squid fails on the current runing
> > server it will try to restart itself. If restart fails it will failover.
> > So basically I am trying to make a test case scenario that if the squid
> > startup script in /etc/init.d got deleted
>
> Ah, your /etc/init.d script.
>
> Okay, look at your OCF script, what it does when /etc/init.d/squid is
> not there.
>
> -----------
> INIT_SCRIPT=/etc/init.d/squid
>
> case "$1" in
> start)
> ${INIT_SCRIPT} start > /dev/null 2>&1 && exit || exit 1
> ;;
>
> stop)
> ${INIT_SCRIPT} stop > /dev/null 2>&1 && exit || exit 1
> ;;
>
> status)
> ${INIT_SCRIPT} status > /dev/null 2>&1 && exit || exit 1
> ;;
>
> monitor)
> # Check if Ressource is stopped
> ${INIT_SCRIPT} status > /dev/null 2>&1 || exit 7
>
> # Otherwise check services (XXX: Maybe loosen retry /
> timeout)
> wget -o /dev/null -O /dev/null -T 1 -t 1
> http://localhost:3128/ && exit || exit 1
> ;;
>
> meta-data)
> --------------
>
> So for the next monitor operation, it will exec
> "${INIT_SCRIPT} status > /dev/null 2>&1 || exit 7"
>
> This will propably return 7. So the cluster thinks your resource is
> stopped. As it was running before (I guess?), the cluster will now try
> to stop and start it.
>
> Stop calls
> "stop > /dev/null 2>&1 && exit || exit 1"
>
> This will return 1. So the stop operation failed.
>
> With stonith, your node would be rebooted now. I don't see a stonith
> device, so the resource goes "unmanaged".
>
> I think what you see is intended.
>
> Regards
> Dominik
>
> > and squid crashed it should
> > failover to the other box.....its not.
> >
> > Dominik Klein wrote:
> >> Nick Duda wrote:
> >>> (sorry for the long email, but all my configs are here to view)
> >>>
> >>> I posted before about HA with 2 squid servers. It's just about done,
> >>> but stumbling on something. Everytime i manually cause something to
> >>> happen in hopes to see it failover, it doesnt. For example, I get
> >>> crm_mon to show everything as I want it, and when I kill squid (and
> >>> prevent the xml from restarting it) it just goes into a failed
> >>> state...more below. Anyone see anything wrong with my configs?
> >>>
> >>> Server #1
> >>> Hostname: ha-1
> >>> eth0 - lan (192.168.95.1)
> >>> eth1 - xover to eth1 on other server
> >>>
> >>> Server #2
> >>> Hostname: ha-2
> >>> eth0 - lan (192.168.95.2)
> >>> eth1 - xover to eth1 on other server
> >>>
> >>> ha.cf on each server:
> >>>
> >>> bcast eth1
> >>> mcast eth0 239.0.0.2 694 1 0
> >>> node ha-1 ha-2
> >>> crm on
> >>>
> >>> Not using haresources because of crm
> >>>
> >>> Here is the output from crm_mon:
> >>>
> >>> ============
> >>> Last updated: Mon Apr 21 15:44:53 2008
> >>> Current DC: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d)
> >>> 2 Nodes configured.
> >>> 1 Resources configured.
> >>> ============
> >>>
> >>> Node: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d): online
> >>> Node: ha-2 (1691d699-2a81-4545-8242-b00862431514): online
> >>>
> >>> Resource Group: squid-cluster
> >>> ip0 (heartbeat::ocf:IPaddr2): Started ha-1
> >>> squid (heartbeat::ocf:squid): Started ha-1
> >>>
> >>> If squid stops on the current heartbeat serer, ha-1, it will restart
> >>> within 60sec...so the scripting is working. If i stop the squid
> >>> process and rename it in /etc/init.d/squid to something else, the
> >>> script wont be able to execute the squid start and should failover to
> >>> ha-2, but it doesnt, instead this appears (on both ha-1 and ha-2):
> >>
> >> What exactly do you "rename" and how? It's likely the cluster is
> >> behaving sane and you're just creating a testcase you don't understand.
> >>
> >> Regards
> >> Dominik
> >>
> >>> ============
> >>> Last updated: Mon Apr 21 15:47:49 2008
> >>> Current DC: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d)
> >>> 2 Nodes configured.
> >>> 1 Resources configured.
> >>> ============
> >>>
> >>> Node: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d): online
> >>> Node: ha-2 (1691d699-2a81-4545-8242-b00862431514): online
> >>>
> >>> Resource Group: squid-cluster
> >>> ip0 (heartbeat::ocf:IPaddr2): Started ha-1
> >>> squid (heartbeat::ocf:squid): Started ha-1 (unmanaged) FAILED
> >>>
> >>> Failed actions:
> >>> squid_stop_0 (node=ha-1, call=74, rc=1): Error
> >> _______________________________________________
> >> Linux-HA mailing list
> >> [email protected]
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> See also: http://linux-ha.org/ReportingProblems
> >>
> >
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
>
>
> --
>
> IN-telegence GmbH & Co. KG
> Oskar-Jäger-Str. 125
> 50825 Köln
>
> Registergericht Köln - HRA 14064, USt-ID Nr. DE 194 156 373
> ph Gesellschafter: komware Unternehmensverwaltungsgesellschaft mbH,
> Registergericht Köln - HRB 38396
> Geschäftsführende Gesellschafter: Christian Plätke und Holger Jansen
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems