Ciao, On Tue, Apr 14, 2009 at 01:51:25PM +0200, Cristina Bulfon wrote: > Ciao, > > I don't think.. in V1 style is working, the behavior change with V2 style. > In attachment you will find a small ha-log file (zip format).
The monitor operation on afs reports 7 (not started) even though the previous start operation succeeds: crmd[19180]: 2009/04/14_13:38:55 info: process_lrm_event: LRM operation afs_6_start_0 (call=18, rc=0) complete crmd[19180]: 2009/04/14_13:38:56 info: do_lrm_rsc_op: Performing op=afs_6_monitor_120000 key=17:2:0:cc5851a8-04dd-45a6-8700-954bea0f2c78) crmd[19180]: 2009/04/14_13:38:56 info: process_lrm_event: LRM operation afs_6_monitor_120000 (call=19, rc=7) complete You have to take a look at the afs script and see what's going on. Thanks, Dejan > > > > I don't know if the output of "ciblint" could help > > [r...@afsitfs3 crm]# ciblint -L > ERROR: <nvpair name="short-resource-names"...>: [short-resource-names] is > not a legal name for the <crm_config> section > ERROR: <nvpair name="transition-idle-timeout"...>: > [transition-idle-timeout] is not a legal name for the <crm_config> section > WARNING: STONITH disabled <nvpair name="stonith-enabled" value="false">. > STONITH is STRONGLY recommended. > WARNING: No STONITH resources configured. STONITH is not available. > INFO: See http://linux-ha.org/ciblint/stonith for more information on this > topic. > INFO: See http://linux-ha.org/ciblint/crm_config#stonith-enabled for more > information on this topic. > WARNING: resource afs_6 has failcount 2 on node afsitfs3.roma1.infn.it > INFO: Resource Filesystem_4 running on node afsitfs3.roma1.infn.it > INFO: Resource Filesystem_2 running on node afsitfs3.roma1.infn.it > INFO: Resource drbddisk_1 running on node afsitfs3.roma1.infn.it > INFO: Resource drbddisk_3 running on node afsitfs3.roma1.infn.it > WARNING: Resource afs_6 not running anywhere. > INFO: Resource IPaddr_141_108_26_31 running on node afsitfs3.roma1.infn.it > > Thanks > > cristina > > On Apr 14, 2009, at 1:00 PM, Dejan Muhamedagic wrote: > >> Hi, >> >> On Tue, Apr 14, 2009 at 10:56:23AM +0200, Cristina Bulfon wrote: >>> Ciao, >>> >>> thanks for the answer ... Dejan has already pointed me out regarding the >>> IP. >>> That IP is the alias IP for the AFS server, and I was using also with >>> IPaddr2 because at the beginning, >>> while I was configuring AFS, I had probem with network communication and >>> I >>> thought to redirect the traffic >>> on that IP. I've solved that problem and I forgot to delete the entry in >>> haresource file >>> beacuse that configuration work fine with V1... >>> >>> Anyway I correct the haresource file as follows >>> >>> afsitfs3.roma1.infn.it \ >>> drbddisk::afs_fs Filesystem::/dev/drbd1::/vicepa/::xfs \ >>> drbddisk::afs_sw Filesystem::/dev/drbd2::/usr/afs::ext3 \ >>> 141.108.26.31 afs >>> >>> and create the cib.xml I don't have anymore the error but the AFS >>> start/stop >>> continuously >> >> Probably an afs issue. What do you see in the logs? >> >> Dejan >> >>> cristina >>> >>> On Apr 14, 2009, at 10:38 AM, Andrew Beekhof wrote: >>> >>>> On Fri, Apr 10, 2009 at 12:25, Cristina Bulfon >>>> <[email protected]> wrote: >>>>> Dejan, >>>>> >>>>> I've followed your advice and I've moved to V2, first the software has >>>>> been >>>>> updated to version 2.1.4. >>>>> I just modified the following files >>>>> >>>>> - ha.cf, added the line >>>>> crm yes >>>>> >>>>> - cib.xml has been produced using the python script and my haresources >>>>> >>>>> afsitfs3.roma1.infn.it IPaddr2::141.108.26.31/24/eth0:0 >>>>> afsitfs3.roma1.infn.it drbddisk::afs_fs >>>>> Filesystem::/dev/drbd1::/vicepa::xfs >>>>> afsitfs3.roma1.infn.it drbddisk::afs_sw >>>>> Filesystem::/dev/drbd2::/usr/afs::ext3 >>>>> afsitfs3.roma1.infn.it 141.108.26.31 afs >>>>> >>>>> >>>>> With this kind of configuration I've got a lot of error and the AFS >>>>> resource >>>>> doesn't work >>>> >>>> Looks to me like the ip address is the one that doesn't work. Did you >>>> actually read the output you pasted below? >>>> >>>> You might want to double check the nic and netmask attributes, they're >>>> probably swapped around. >>>> >>>>> >>>>> - crm_verify -L -x /var/lib/heartbeat/crm/cib.xml >>>>> >>>>> crm_verify[30489]: 2009/04/10_12:20:01 ERROR: unpack_rsc_op: Hard >>>>> error: >>>>> IPaddr2_1_monitor_0 failed with rc=2. >>>>> crm_verify[30489]: 2009/04/10_12:20:01 ERROR: unpack_rsc_op: >>>>> Preventing >>>>> IPaddr2_1 from re-starting on afsitfs4.roma1.infn.it >>>>> crm_verify[30489]: 2009/04/10_12:20:01 ERROR: unpack_rsc_op: Hard >>>>> error: >>>>> IPaddr2_1_monitor_0 failed with rc=2. >>>>> crm_verify[30489]: 2009/04/10_12:20:01 ERROR: unpack_rsc_op: >>>>> Preventing >>>>> IPaddr2_1 from re-starting on afsitfs3.roma1.infn.it >>>>> >>>>> I've attached both cib.xml, ha-log and ha.cf >>>>> >>>>> Thanks for helping me >>>>> >>>>> cristina >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Apr 8, 2009, at 5:50 PM, Cristina Bulfon wrote: >>>>> >>>>>> Dejan, >>>>>> >>>>>> thanks so much for the explanation :-) >>>>>> >>>>>> c. >>>>>> >>>>>> On Apr 8, 2009, at 5:46 PM, Dejan Muhamedagic wrote: >>>>>> >>>>>>> Ciao, >>>>>>> >>>>>>> On Wed, Apr 08, 2009 at 04:17:45PM +0200, Cristina Bulfon wrote: >>>>>>>> >>>>>>>> Ciao Dejan, >>>>>>>> >>>>>>>> thanks for the answer. >>>>>>>> Do you mean that I have to use heartbeat V2 plus CRM and there is a >>>>>>>> way >>>>>>>> to >>>>>>>> check the HBA without using >>>>>>>> hbaping ? >>>>>>> >>>>>>> Unlike Heartbeat v1, CRM/v2 can monitor resources. I suppose that >>>>>>> in your case, a failing HBA would cause drbd or Filesystem >>>>>>> monitor action to fail, which would result in either a failover >>>>>>> or restart, depending on the configuration. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Dejan >>>>>>> >>>>>>>> Just to be sure if I have understood correctly. I am newby on >>>>>>>> heartbeat >>>>>>>> V2 >>>>>>>> >>>>>>>> thanks >>>>>>>> >>>>>>>> cristina >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mar 31, 2009, at 2:00 PM, Dejan Muhamedagic wrote: >>>>>>>> >>>>>>>>> Ciao, >>>>>>>>> >>>>>>>>> On Tue, Mar 31, 2009 at 01:48:47PM +0200, Cristina Bulfon wrote: >>>>>>>>>> >>>>>>>>>> Ciao, >>>>>>>>>> >>>>>>>>>> in our heartbeat cluster we have simulated the breaking of the HBA >>>>>>>>>> by >>>>>>>>>> unplugging the fiber from HBA on the primary node. The resource >>>>>>>>>> didn't >>>>>>>>>> switch to the secondary node and on the log file on primary node >>>>>>>>>> reported >>>>>>>>>> the following messages: >>>>>>>>>> >>>>>>>>>> Feb 19 14:33:33 afsitfs3 kernel: qla2xxx 0000:0a:01.0: LOOP DOWN >>>>>>>>>> detected >>>>>>>>>> (2 e678 16ed). >>>>>>>>>> Feb 19 14:33:38 afsitfs3 kernel: qla2xxx 0000:0a:01.1: LOOP DOWN >>>>>>>>>> detected >>>>>>>>>> (2 8633 16fc). >>>>>>>>>> Feb 19 14:33:46 afsitfs3 kernel: qla2x00: FAILOVER device 2 from >>>>>>>>>> 200500a0b832d169 -> 200400a0b832d16a - LUN 10, reason=0x2 >>>>>>>>>> Feb 19 14:33:46 afsitfs3 kernel: qla2x00: FROM HBA 0 to HBA 1 >>>>>>>>>> Feb 19 14:33:52 afsitfs3 kernel: qla2x00: FAILOVER device 2 from >>>>>>>>>> 200400a0b832d16a -> 200500a0b832d16a - LUN 10, reason=0x2 >>>>>>>>>> Feb 19 14:33:52 afsitfs3 kernel: qla2x00: FROM HBA 1 to HBA 1 >>>>>>>>>> Feb 19 14:33:55 afsitfs3 kernel: qla2x00: FAILOVER device 2 from >>>>>>>>>> 200500a0b832d16a -> 200400a0b832d169 - LUN 10, reason=0x2 >>>>>>>>>> Feb 19 14:33:55 afsitfs3 kernel: qla2x00: FROM HBA 1 to HBA 0 >>>>>>>>>> Feb 19 14:33:58 afsitfs3 kernel: qla2x00: FAILOVER device 2 from >>>>>>>>>> 200400a0b832d169 -> 200500a0b832d169 - LUN 10, reason=0x2 >>>>>>>>>> Feb 19 14:33:58 afsitfs3 kernel: qla2x00: FROM HBA 0 to HBA 0 >>>>>>>>>> Feb 19 14:34:01 afsitfs3 kernel: qla2x00: FAILOVER device 2 from >>>>>>>>>> 200500a0b832d169 -> 200400a0b832d16a - LUN 10, reason=0x2 >>>>>>>>>> >>>>>>>>>> In some way I expected this kind of messages but I do not >>>>>>>>>> understand >>>>>>>>>> why >>>>>>>>>> the secondary node doesn't take the control of the resources. >>>>>>>>>> >>>>>>>>>> In the ha.cf there is not nothing related to HBA and the >>>>>>>>>> haresources >>>>>>>>>> file >>>>>>>>>> is >>>>>>>>>> >>>>>>>>>> afsitfs3.roma1.infn.it IPaddr2::Y.Y.Y.Y/24/eth0:0 >>>>>>>>>> afsitfs3.roma1.infn.it drbddisk::r0 >>>>>>>>>> Filesystem::/dev/drbd1::/vicepa::xfs >>>>>>>>>> afsitfs3.roma1.infn.it drbddisk::r1 >>>>>>>>>> Filesystem::/dev/drbd2::/usr/afs::ext3 >>>>>>>>>> afsitfs3.roma1.infn.it Y.Y.Y.Y afs >>>>>>>>> >>>>>>>>> There's no resource monitoring with v1. For that you have to go >>>>>>>>> with v2/Pacemaker (aka CRM). >>>>>>>>> >>>>>>>>>> Also tried to use hbaping compiling the hbaapi_src_2.2 but without >>>>>>>>>> success >>>>>>>>>> .. got problem during the compilations and I didn't understand if >>>>>>>>>> I >>>>>>>>>> have >>>>>>>>>> to >>>>>>>>>> use libHBAAPI.so from hbaapi or from HBA vendor. >>>>>>>>> >>>>>>>>> That could work with ipfail, perhaps. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Dejan >>>>>>>>> >>>>>>>>>> Our FC controller is >>>>>>>>>> Logic PCI to Fibre Channel Host Adapter for QLA2342: >>>>>>>>>> Firmware version 3.03.25 IPX, Driver version 8.02.14.01-fo >>>>>>>>>> >>>>>>>>>> Thanks in advance >>>>>>>>>> >>>>>>>>>> cristina >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Linux-HA mailing list >>>>>>>>>> [email protected] >>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Linux-HA mailing list >>>>>>>>> [email protected] >>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Linux-HA mailing list >>>>>>>> [email protected] >>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Linux-HA mailing list >>>>>>> [email protected] >>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Linux-HA mailing list >>>>>> [email protected] >>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Linux-HA mailing list >>>>> [email protected] >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>> See also: http://linux-ha.org/ReportingProblems >>>>> >>>> _______________________________________________ >>>> Linux-HA mailing list >>>> [email protected] >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>> See also: http://linux-ha.org/ReportingProblems >>>> >>> >>> _______________________________________________ >>> Linux-HA mailing list >>> [email protected] >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
