Thanks, Andreas. That is what I suspected too. Once stonith disabled, the
cluster starts.
I have not tried to set quorum yet. I will try next.
Now I have another problem. Apache does not start but virtual IP address bonded
to the NIC.
[r...@usnbrl51 ~]# crm configure show
node $id="01428973-0d27-48ee-9142-2da9cb5c1e4b" usnbrl50.liz.com
node $id="0988f7f3-c858-4c12-af3b-b6a8bf83a0ae" usnbrl51.liz.com
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="156.146.22.48" cidr_netmask="32" \
op monitor interval="30s"
primitive WebSite ocf:heartbeat:apache \
params configfile="/etc/httpd/conf/httpd.conf" \
op monitor interval="1min"
location prefer-usnbrl50 WebSite 50: usnbrl50
colocation website-with-ip inf: WebSite ClusterIP
order apache-after-ip inf: ClusterIP WebSite
property $id="cib-bootstrap-options" \
dc-version="1.0.6-fdba003eafa6af1b8d81b017aa535a949606ca0d" \
cluster-infrastructure="Heartbeat" \
no-quorum-policy="ignore" \
stonith-enabled="false"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"
[r...@usnbrl51 ~]#
Last updated: Thu Feb 18 17:28:14 2010
Stack: Heartbeat
Current DC: usnbrl51.liz.com (0988f7f3-c858-4c12-af3b-b6a8bf83a0ae) - partition
WITHOUT quorum
Version: 1.0.6-fdba003eafa6af1b8d81b017aa535a949606ca0d
2 Nodes configured, unknown expected votes
2 Resources configured.
Online: [ usnbrl51.liz.com usnbrl50.liz.com ]
ClusterIP (ocf::heartbeat:IPaddr2 Started usnbrl50.liz.com
WebSite_start_0 (node=usnbrl51.liz.com, call=6, rc=1, status=complete):
unknown error
WebSite_start_0 (node=usnbrl50.liz.com, call=6, rc=1, status=complete):
unknown error
In the Apache's error log, it shows "caught SIGTERM, shuting down".
On the /var/log/messages, it does not say why Apache can't start also. I can
manually start Apache no problem.
Feb 18 16:42:51 usnbrl50 crmd: [3610]: info: process_lrm_event: LRM operation Cl
usterIP_start_0 (call=4, rc=0, cib-update=15, confirmed=true) ok
Feb 18 16:42:53 usnbrl50 crmd: [3610]: info: do_lrm_rsc_op: Performing key=9:4:0
:ea164eb4-9fea-4d79-83f6-0ad29f8521a5 op=ClusterIP_monitor_30000 )
Feb 18 16:42:53 usnbrl50 lrmd: [3607]: info: rsc:ClusterIP:5: monitor
Feb 18 16:42:53 usnbrl50 crmd: [3610]: info: do_lrm_rsc_op: Performing key=10:4:
0:ea164eb4-9fea-4d79-83f6-0ad29f8521a5 op=WebSite_start_0 )
Feb 18 16:42:53 usnbrl50 lrmd: [3607]: info: rsc:WebSite:6: start
Feb 18 16:42:53 usnbrl50 crmd: [3610]: info: process_lrm_event: LRM operation
ClusterIP_monitor_30000 (call=5, rc=0, cib-update=16, confirmed=false) ok
Feb 18 16:42:53 usnbrl50 apache[3754]: [3818]: INFO: apache not running
Feb 18 16:42:53 usnbrl50 apache[3754]: [3820]: INFO: waiting for apache
/etc/httpd/conf/httpd.conf to come up
Feb 18 16:42:54 usnbrl50 crmd: [3610]: info: process_lrm_event: LRM operation
WebSite_start_0 (call=6, rc=1, cib-update=17, confirmed=true) unknown error
Feb 18 16:42:55 usnbrl50 attrd: [3609]: info: attrd_ha_callback: Update relayed
from usnbrl51.liz.com
Feb 18 16:42:55 usnbrl50 attrd: [3609]: info: attrd_trigger_update: Sending
flush op to all hosts for: fail-count-WebSite (INFINITY)
Feb 18 16:42:55 usnbrl50 crmd: [3610]: info: do_lrm_rsc_op: Performing
key=2:5:0:ea164eb4-9fea-4d79-83f6-0ad29f8521a5 op=WebSite_stop_0 )
Feb 18 16:42:55 usnbrl50 lrmd: [3607]: info: rsc:WebSite:7: stop
Feb 18 16:42:55 usnbrl50 attrd: [3609]: info: attrd_perform_update: Sent update
16: fail-count-WebSite=INFINITY
Feb 18 16:42:55 usnbrl50 attrd: [3609]: info: attrd_ha_callback: Update relayed
from usnbrl51.liz.com
Feb 18 16:42:55 usnbrl50 attrd: [3609]: info: attrd_trigger_update: Sending
flush op to all hosts for: last-failure-WebSite (1266529375)
Feb 18 16:42:55 usnbrl50 attrd: [3609]: info: attrd_perform_update: Sent update
19: last-failure-WebSite=1266529375
Feb 18 16:42:55 usnbrl50 lrmd: [3607]: info: RA output: (ClusterIP:start:stderr)
ARPING 192.168.9.101 from 192.168.9.101 eth0 Sent 5 probes (5 broadcast(s))
Received 0 response(s)
Feb 18 16:42:56 usnbrl50 lrmd: [3607]: info: RA output: (WebSite:stop:stderr)
/usr/lib/ocf/resource.d//heartbeat/apache: line 437: kill: (3816) - No such
process
Feb 18 16:42:56 usnbrl50 apache[3842]: [3876]: INFO: Killing apache PID 3816
Feb 18 16:42:56 usnbrl50 apache[3842]: [3878]: INFO: apache stopped.
Feb 18 16:42:56 usnbrl50 crmd: [3610]: info: process_lrm_event: LRM operation
WebSite_stop_0 (call=7, rc=0, cib-update=18, confirmed=true) ok
Feb 18 16:44:08 usnbrl50 pengine: [4004]: info: crm_log_init: Changed active
directory to /usr/local/var/lib/heartbeat/cores/root
Feb 18 16:44:08 usnbrl50 pengine: [4004]: info: Invoked:
/usr/local/lib64/heartbeat/pengine metadata
Thanks.
Ryan
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Andreas Kurz
Sent: Thursday, February 18, 2010 7:23 AM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Setup cluster
On Thursday 18 February 2010 12:55:50 Dejan Muhamedagic wrote:
> Hi,
>
> On Wed, Feb 17, 2010 at 12:15:38PM -0500, Ruiyuan Jiang wrote:
> > Hi,
> >
> > I am trying to setup my first cluster on Redhat Enterprise Server v5.4.
> > Currently there is no disks for both of my hosts yet. The version that I
> > have is heartbeat 3.0.2.
> >
> > I did the command:
> >
> > # crm configure primitive ClusterIP ocf:heartbeat:IPaddr2 \
> > params ip=192.168.9.101 cidr_netmask=32 \
> > op monitor interval=30s
> >
> > # crm_mon
> > Last updated: Wed Feb 17 12:07:22 2010
> > Stack: Heartbeat
> > Current DC: usnbrl51.liz.com (0988f7f3-c858-4c12-af3b-b6a8bf83a0ae) -
> > partition WITHOUT quorum Version:
> > 1.0.6-fdba003eafa6af1b8d81b017aa535a949606ca0d
> > 2 Nodes configured, unknown expected votes
> > 1 Resources configured.
> > Online: [ usnbrl51.liz.com usnbrl50.liz.com ]
> >
> >
> > From the output, I ignore " unknown expected votes". Is it safe
> > to ignore " partition WITHOUT quorum" for now? Also I don't see
> > my cluster IP.
>
> Don't think it matters, but you could set it anyway:
>
> property expected-quorum-votes="2"
>
> > # crm configure show
> > node $id="01428973-0d27-48ee-9142-2da9cb5c1e4b" usnbrl50.liz.com
> > node $id="0988f7f3-c858-4c12-af3b-b6a8bf83a0ae" usnbrl51.liz.com
> > primitive ClusterIP ocf:heartbeat:IPaddr2 \
> > params ip="192.168.9.101" cidr_netmask="32" \
> > op monitor interval="30s"
> > property $id="cib-bootstrap-options" \
> > dc-version="1.0.6-fdba003eafa6af1b8d81b017aa535a949606ca0d" \
> > cluster-infrastructure="Heartbeat" \
> > no-quorum-policy="ignore"
> > #
> >
> > Above command shows that the cluster has the its virtual IP.
>
> The resource is defined, but it's not running. Looks like the CRM
> didn't try to start it. The logs should show why.
no stonith-resources but stonith is enabled (default) --> resource management
is disabled ... set the stonith-enable property to false and the resource
should start
Regards,
Andreas
>
> Thanks,
>
> Dejan
>
> > What I did wrong here? Thanks.
> >
> > Ryan
> >
> >
> >
> > This message (including any attachments) is intended
> > solely for the specific individual(s) or entity(ies) named
> > above, and may contain legally privileged and
> > confidential information. If you are not the intended
> > recipient, please notify the sender immediately by
> > replying to this message and then delete it.
> > Any disclosure, copying, or distribution of this message,
> > or the taking of any action based on it, by other than the
> > intended recipient, is strictly prohibited.
> >
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
This message (including any attachments) is intended
solely for the specific individual(s) or entity(ies) named
above, and may contain legally privileged and
confidential information. If you are not the intended
recipient, please notify the sender immediately by
replying to this message and then delete it.
Any disclosure, copying, or distribution of this message,
or the taking of any action based on it, by other than the
intended recipient, is strictly prohibited.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems