On Fri, Feb 19, 2010 at 4:58 PM, Ruiyuan Jiang <[email protected]> wrote: > Hi, Andreas > > I am not sure what you mean to monitor it. Can you give me more detailed info?
In the apache OCF script, there is a function called monitor. Check what it does, see if that works for you. > I did reboot both nodes and then I found Apache did not start. After reboot, > I did manually start Apache because Apache did not start automatically and it > started and I could access the web site. Thanks. > > Ryan > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Andrew Beekhof > Sent: Friday, February 19, 2010 8:18 AM > To: General Linux-HA mailing list > Subject: Re: [Linux-HA] Setup cluster > > On Thu, Feb 18, 2010 at 11:51 PM, Ruiyuan Jiang <[email protected]> wrote: >> Thanks, Andreas. That is what I suspected too. Once stonith disabled, the >> cluster starts. >> I have not tried to set quorum yet. I will try next. >> Now I have another problem. Apache does not start but virtual IP address >> bonded to the NIC. >> >> [r...@usnbrl51 ~]# crm configure show >> node $id="01428973-0d27-48ee-9142-2da9cb5c1e4b" usnbrl50.liz.com >> node $id="0988f7f3-c858-4c12-af3b-b6a8bf83a0ae" usnbrl51.liz.com >> primitive ClusterIP ocf:heartbeat:IPaddr2 \ >> params ip="156.146.22.48" cidr_netmask="32" \ >> op monitor interval="30s" >> primitive WebSite ocf:heartbeat:apache \ >> params configfile="/etc/httpd/conf/httpd.conf" \ >> op monitor interval="1min" >> location prefer-usnbrl50 WebSite 50: usnbrl50 >> colocation website-with-ip inf: WebSite ClusterIP >> order apache-after-ip inf: ClusterIP WebSite >> property $id="cib-bootstrap-options" \ >> dc-version="1.0.6-fdba003eafa6af1b8d81b017aa535a949606ca0d" \ >> cluster-infrastructure="Heartbeat" \ >> no-quorum-policy="ignore" \ >> stonith-enabled="false" >> rsc_defaults $id="rsc-options" \ >> resource-stickiness="100" >> [r...@usnbrl51 ~]# >> >> Last updated: Thu Feb 18 17:28:14 2010 >> Stack: Heartbeat >> Current DC: usnbrl51.liz.com (0988f7f3-c858-4c12-af3b-b6a8bf83a0ae) - >> partition WITHOUT quorum >> Version: 1.0.6-fdba003eafa6af1b8d81b017aa535a949606ca0d >> 2 Nodes configured, unknown expected votes >> 2 Resources configured. >> Online: [ usnbrl51.liz.com usnbrl50.liz.com ] >> >> ClusterIP (ocf::heartbeat:IPaddr2 Started usnbrl50.liz.com >> WebSite_start_0 (node=usnbrl51.liz.com, call=6, rc=1, status=complete): >> unknown error >> WebSite_start_0 (node=usnbrl50.liz.com, call=6, rc=1, status=complete): >> unknown error >> >> In the Apache's error log, it shows "caught SIGTERM, shuting down". >> >> On the /var/log/messages, it does not say why Apache can't start also. I can >> manually start Apache no problem. > > But can you monitor it. > There's some option that need to be passed to apache to enable > monitoring, I'm guessing its not set > >> >> >> Feb 18 16:42:51 usnbrl50 crmd: [3610]: info: process_lrm_event: LRM >> operation Cl >> usterIP_start_0 (call=4, rc=0, cib-update=15, confirmed=true) ok >> Feb 18 16:42:53 usnbrl50 crmd: [3610]: info: do_lrm_rsc_op: Performing >> key=9:4:0 >> :ea164eb4-9fea-4d79-83f6-0ad29f8521a5 op=ClusterIP_monitor_30000 ) >> Feb 18 16:42:53 usnbrl50 lrmd: [3607]: info: rsc:ClusterIP:5: monitor >> Feb 18 16:42:53 usnbrl50 crmd: [3610]: info: do_lrm_rsc_op: Performing >> key=10:4: >> 0:ea164eb4-9fea-4d79-83f6-0ad29f8521a5 op=WebSite_start_0 ) >> Feb 18 16:42:53 usnbrl50 lrmd: [3607]: info: rsc:WebSite:6: start >> Feb 18 16:42:53 usnbrl50 crmd: [3610]: info: process_lrm_event: LRM >> operation ClusterIP_monitor_30000 (call=5, rc=0, cib-update=16, >> confirmed=false) ok >> Feb 18 16:42:53 usnbrl50 apache[3754]: [3818]: INFO: apache not running >> Feb 18 16:42:53 usnbrl50 apache[3754]: [3820]: INFO: waiting for apache >> /etc/httpd/conf/httpd.conf to come up >> Feb 18 16:42:54 usnbrl50 crmd: [3610]: info: process_lrm_event: LRM >> operation WebSite_start_0 (call=6, rc=1, cib-update=17, confirmed=true) >> unknown error >> Feb 18 16:42:55 usnbrl50 attrd: [3609]: info: attrd_ha_callback: Update >> relayed from usnbrl51.liz.com >> Feb 18 16:42:55 usnbrl50 attrd: [3609]: info: attrd_trigger_update: Sending >> flush op to all hosts for: fail-count-WebSite (INFINITY) >> Feb 18 16:42:55 usnbrl50 crmd: [3610]: info: do_lrm_rsc_op: Performing >> key=2:5:0:ea164eb4-9fea-4d79-83f6-0ad29f8521a5 op=WebSite_stop_0 ) >> Feb 18 16:42:55 usnbrl50 lrmd: [3607]: info: rsc:WebSite:7: stop >> Feb 18 16:42:55 usnbrl50 attrd: [3609]: info: attrd_perform_update: Sent >> update 16: fail-count-WebSite=INFINITY >> Feb 18 16:42:55 usnbrl50 attrd: [3609]: info: attrd_ha_callback: Update >> relayed from usnbrl51.liz.com >> Feb 18 16:42:55 usnbrl50 attrd: [3609]: info: attrd_trigger_update: Sending >> flush op to all hosts for: last-failure-WebSite (1266529375) >> Feb 18 16:42:55 usnbrl50 attrd: [3609]: info: attrd_perform_update: Sent >> update 19: last-failure-WebSite=1266529375 >> Feb 18 16:42:55 usnbrl50 lrmd: [3607]: info: RA output: >> (ClusterIP:start:stderr) >> ARPING 192.168.9.101 from 192.168.9.101 eth0 Sent 5 probes (5 broadcast(s)) >> Received 0 response(s) >> Feb 18 16:42:56 usnbrl50 lrmd: [3607]: info: RA output: >> (WebSite:stop:stderr) /usr/lib/ocf/resource.d//heartbeat/apache: line 437: >> kill: (3816) - No such process >> Feb 18 16:42:56 usnbrl50 apache[3842]: [3876]: INFO: Killing apache PID 3816 >> Feb 18 16:42:56 usnbrl50 apache[3842]: [3878]: INFO: apache stopped. >> Feb 18 16:42:56 usnbrl50 crmd: [3610]: info: process_lrm_event: LRM >> operation WebSite_stop_0 (call=7, rc=0, cib-update=18, confirmed=true) ok >> Feb 18 16:44:08 usnbrl50 pengine: [4004]: info: crm_log_init: Changed active >> directory to /usr/local/var/lib/heartbeat/cores/root >> Feb 18 16:44:08 usnbrl50 pengine: [4004]: info: Invoked: >> /usr/local/lib64/heartbeat/pengine metadata >> >> >> Thanks. >> Ryan >> > > > > > This message (including any attachments) is intended > solely for the specific individual(s) or entity(ies) named > above, and may contain legally privileged and > confidential information. If you are not the intended > recipient, please notify the sender immediately by > replying to this message and then delete it. > Any disclosure, copying, or distribution of this message, > or the taking of any action based on it, by other than the > intended recipient, is strictly prohibited. > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
