Ciao,

On Tue, Apr 14, 2009 at 01:51:25PM +0200, Cristina Bulfon wrote:
> Ciao,
>
> I don't think.. in V1 style is working, the behavior change with V2 style.
> In attachment you will find a small ha-log file (zip format).

The monitor operation on afs reports 7 (not started) even though
the previous start operation succeeds:

crmd[19180]: 2009/04/14_13:38:55 info: process_lrm_event: LRM operation 
afs_6_start_0 (call=18, rc=0) complete
crmd[19180]: 2009/04/14_13:38:56 info: do_lrm_rsc_op: Performing 
op=afs_6_monitor_120000 key=17:2:0:cc5851a8-04dd-45a6-8700-954bea0f2c78)
crmd[19180]: 2009/04/14_13:38:56 info: process_lrm_event: LRM operation 
afs_6_monitor_120000 (call=19, rc=7) complete

You have to take a look at the afs script and see what's going
on.

Thanks,

Dejan


>
>
>
> I don't know if the output of "ciblint" could help
>
> [r...@afsitfs3 crm]# ciblint -L
> ERROR: <nvpair name="short-resource-names"...>: [short-resource-names] is 
> not a legal name for the <crm_config> section
> ERROR: <nvpair name="transition-idle-timeout"...>: 
> [transition-idle-timeout] is not a legal name for the <crm_config> section
> WARNING: STONITH disabled <nvpair name="stonith-enabled" value="false">.  
> STONITH is STRONGLY recommended.
> WARNING: No STONITH resources configured.  STONITH is not available.
> INFO: See http://linux-ha.org/ciblint/stonith for more information on this 
> topic.
> INFO: See http://linux-ha.org/ciblint/crm_config#stonith-enabled for more 
> information on this topic.
> WARNING: resource afs_6 has failcount 2 on node afsitfs3.roma1.infn.it
> INFO: Resource Filesystem_4 running on node afsitfs3.roma1.infn.it
> INFO: Resource Filesystem_2 running on node afsitfs3.roma1.infn.it
> INFO: Resource drbddisk_1 running on node afsitfs3.roma1.infn.it
> INFO: Resource drbddisk_3 running on node afsitfs3.roma1.infn.it
> WARNING: Resource afs_6 not running anywhere.
> INFO: Resource IPaddr_141_108_26_31 running on node afsitfs3.roma1.infn.it
>
> Thanks
>
> cristina
>
> On Apr 14, 2009, at 1:00 PM, Dejan Muhamedagic wrote:
>
>> Hi,
>>
>> On Tue, Apr 14, 2009 at 10:56:23AM +0200, Cristina Bulfon wrote:
>>> Ciao,
>>>
>>> thanks for the answer ... Dejan has already pointed me out regarding the
>>> IP.
>>> That IP is the alias IP for the AFS server, and I was using also with
>>> IPaddr2 because at the beginning,
>>> while I was configuring AFS, I had probem with network communication and 
>>> I
>>> thought to redirect the traffic
>>> on that IP. I've solved that problem and I forgot to delete the entry in
>>> haresource file
>>> beacuse that configuration work fine with V1...
>>>
>>> Anyway I correct the haresource file as follows
>>>
>>> afsitfs3.roma1.infn.it \
>>>        drbddisk::afs_fs Filesystem::/dev/drbd1::/vicepa/::xfs \
>>>        drbddisk::afs_sw Filesystem::/dev/drbd2::/usr/afs::ext3 \
>>>        141.108.26.31 afs
>>>
>>> and create the cib.xml  I don't have anymore the error  but the AFS
>>> start/stop
>>> continuously
>>
>> Probably an afs issue. What do you see in the logs?
>>
>> Dejan
>>
>>> cristina
>>>
>>> On Apr 14, 2009, at 10:38 AM, Andrew Beekhof wrote:
>>>
>>>> On Fri, Apr 10, 2009 at 12:25, Cristina Bulfon
>>>> <[email protected]> wrote:
>>>>> Dejan,
>>>>>
>>>>> I've followed your advice and I've moved to V2, first the software has
>>>>> been
>>>>> updated to version 2.1.4.
>>>>> I just modified the following files
>>>>>
>>>>> - ha.cf, added the line
>>>>>        crm yes
>>>>>
>>>>> - cib.xml has been produced using the python script and my haresources
>>>>>
>>>>>       afsitfs3.roma1.infn.it IPaddr2::141.108.26.31/24/eth0:0
>>>>>       afsitfs3.roma1.infn.it drbddisk::afs_fs
>>>>> Filesystem::/dev/drbd1::/vicepa::xfs
>>>>>       afsitfs3.roma1.infn.it drbddisk::afs_sw
>>>>> Filesystem::/dev/drbd2::/usr/afs::ext3
>>>>>       afsitfs3.roma1.infn.it 141.108.26.31 afs
>>>>>
>>>>>
>>>>> With this kind of configuration I've got a lot of error and the AFS
>>>>> resource
>>>>> doesn't work
>>>>
>>>> Looks to me like the ip address is the one that doesn't work.  Did you
>>>> actually read the output you pasted below?
>>>>
>>>> You might want to double check the nic and netmask attributes, they're
>>>> probably swapped around.
>>>>
>>>>>
>>>>> - crm_verify -L  -x /var/lib/heartbeat/crm/cib.xml
>>>>>
>>>>> crm_verify[30489]: 2009/04/10_12:20:01 ERROR: unpack_rsc_op: Hard 
>>>>> error:
>>>>> IPaddr2_1_monitor_0 failed with rc=2.
>>>>> crm_verify[30489]: 2009/04/10_12:20:01 ERROR: unpack_rsc_op:   
>>>>> Preventing
>>>>> IPaddr2_1 from re-starting on afsitfs4.roma1.infn.it
>>>>> crm_verify[30489]: 2009/04/10_12:20:01 ERROR: unpack_rsc_op: Hard 
>>>>> error:
>>>>> IPaddr2_1_monitor_0 failed with rc=2.
>>>>> crm_verify[30489]: 2009/04/10_12:20:01 ERROR: unpack_rsc_op:   
>>>>> Preventing
>>>>> IPaddr2_1 from re-starting on afsitfs3.roma1.infn.it
>>>>>
>>>>> I've attached both cib.xml, ha-log and ha.cf
>>>>>
>>>>> Thanks for helping me
>>>>>
>>>>> cristina
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Apr 8, 2009, at 5:50 PM, Cristina Bulfon wrote:
>>>>>
>>>>>> Dejan,
>>>>>>
>>>>>> thanks so much for the explanation :-)
>>>>>>
>>>>>> c.
>>>>>>
>>>>>> On Apr 8, 2009, at 5:46 PM, Dejan Muhamedagic wrote:
>>>>>>
>>>>>>> Ciao,
>>>>>>>
>>>>>>> On Wed, Apr 08, 2009 at 04:17:45PM +0200, Cristina Bulfon wrote:
>>>>>>>>
>>>>>>>> Ciao Dejan,
>>>>>>>>
>>>>>>>> thanks for the answer.
>>>>>>>> Do you mean that I have to use heartbeat V2 plus CRM  and there is a
>>>>>>>> way
>>>>>>>> to
>>>>>>>> check the HBA without using
>>>>>>>> hbaping ?
>>>>>>>
>>>>>>> Unlike Heartbeat v1, CRM/v2 can monitor resources. I suppose that
>>>>>>> in your case, a failing HBA would cause drbd or Filesystem
>>>>>>> monitor action to fail, which would result in either a failover
>>>>>>> or restart, depending on the configuration.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Dejan
>>>>>>>
>>>>>>>> Just to be sure if I have understood correctly. I am newby on
>>>>>>>> heartbeat
>>>>>>>> V2
>>>>>>>>
>>>>>>>> thanks
>>>>>>>>
>>>>>>>> cristina
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mar 31, 2009, at 2:00 PM, Dejan Muhamedagic wrote:
>>>>>>>>
>>>>>>>>> Ciao,
>>>>>>>>>
>>>>>>>>> On Tue, Mar 31, 2009 at 01:48:47PM +0200, Cristina Bulfon wrote:
>>>>>>>>>>
>>>>>>>>>> Ciao,
>>>>>>>>>>
>>>>>>>>>> in our heartbeat cluster we have simulated the breaking of the HBA
>>>>>>>>>> by
>>>>>>>>>> unplugging the fiber from HBA on the primary node. The resource
>>>>>>>>>> didn't
>>>>>>>>>> switch to   the secondary node and on the log file on primary node
>>>>>>>>>> reported
>>>>>>>>>> the following messages:
>>>>>>>>>>
>>>>>>>>>> Feb 19 14:33:33 afsitfs3 kernel: qla2xxx 0000:0a:01.0: LOOP DOWN
>>>>>>>>>> detected
>>>>>>>>>> (2 e678 16ed).
>>>>>>>>>> Feb 19 14:33:38 afsitfs3 kernel: qla2xxx 0000:0a:01.1: LOOP DOWN
>>>>>>>>>> detected
>>>>>>>>>> (2 8633 16fc).
>>>>>>>>>> Feb 19 14:33:46 afsitfs3 kernel: qla2x00: FAILOVER device 2 from
>>>>>>>>>> 200500a0b832d169 -> 200400a0b832d16a - LUN 10, reason=0x2
>>>>>>>>>> Feb 19 14:33:46 afsitfs3 kernel: qla2x00: FROM HBA 0 to HBA 1
>>>>>>>>>> Feb 19 14:33:52 afsitfs3 kernel: qla2x00: FAILOVER device 2 from
>>>>>>>>>> 200400a0b832d16a -> 200500a0b832d16a - LUN 10, reason=0x2
>>>>>>>>>> Feb 19 14:33:52 afsitfs3 kernel: qla2x00: FROM HBA 1 to HBA 1
>>>>>>>>>> Feb 19 14:33:55 afsitfs3 kernel: qla2x00: FAILOVER device 2 from
>>>>>>>>>> 200500a0b832d16a -> 200400a0b832d169 - LUN 10, reason=0x2
>>>>>>>>>> Feb 19 14:33:55 afsitfs3 kernel: qla2x00: FROM HBA 1 to HBA 0
>>>>>>>>>> Feb 19 14:33:58 afsitfs3 kernel: qla2x00: FAILOVER device 2 from
>>>>>>>>>> 200400a0b832d169 -> 200500a0b832d169 - LUN 10, reason=0x2
>>>>>>>>>> Feb 19 14:33:58 afsitfs3 kernel: qla2x00: FROM HBA 0 to HBA 0
>>>>>>>>>> Feb 19 14:34:01 afsitfs3 kernel: qla2x00: FAILOVER device 2 from
>>>>>>>>>> 200500a0b832d169 -> 200400a0b832d16a - LUN 10, reason=0x2
>>>>>>>>>>
>>>>>>>>>> In some way I expected this kind of messages but  I do not
>>>>>>>>>> understand
>>>>>>>>>> why
>>>>>>>>>> the secondary node doesn't take the control of the resources.
>>>>>>>>>>
>>>>>>>>>> In the ha.cf there is not nothing related to HBA and the 
>>>>>>>>>> haresources
>>>>>>>>>> file
>>>>>>>>>> is
>>>>>>>>>>
>>>>>>>>>> afsitfs3.roma1.infn.it  IPaddr2::Y.Y.Y.Y/24/eth0:0
>>>>>>>>>> afsitfs3.roma1.infn.it  drbddisk::r0
>>>>>>>>>> Filesystem::/dev/drbd1::/vicepa::xfs
>>>>>>>>>> afsitfs3.roma1.infn.it  drbddisk::r1
>>>>>>>>>> Filesystem::/dev/drbd2::/usr/afs::ext3
>>>>>>>>>> afsitfs3.roma1.infn.it         Y.Y.Y.Y   afs
>>>>>>>>>
>>>>>>>>> There's no resource monitoring with v1. For that you have to go
>>>>>>>>> with v2/Pacemaker (aka CRM).
>>>>>>>>>
>>>>>>>>>> Also tried to use hbaping compiling the hbaapi_src_2.2 but without
>>>>>>>>>> success
>>>>>>>>>> .. got problem during the compilations and I didn't understand if 
>>>>>>>>>> I
>>>>>>>>>> have
>>>>>>>>>> to
>>>>>>>>>> use libHBAAPI.so  from hbaapi or from HBA vendor.
>>>>>>>>>
>>>>>>>>> That could work with ipfail, perhaps.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Dejan
>>>>>>>>>
>>>>>>>>>> Our FC controller is
>>>>>>>>>>               Logic PCI to Fibre Channel Host Adapter for QLA2342:
>>>>>>>>>>       Firmware version 3.03.25 IPX, Driver version 8.02.14.01-fo
>>>>>>>>>>
>>>>>>>>>> Thanks in advance
>>>>>>>>>>
>>>>>>>>>> cristina
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Linux-HA mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Linux-HA mailing list
>>>>>>>>> [email protected]
>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Linux-HA mailing list
>>>>>>>> [email protected]
>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Linux-HA mailing list
>>>>>>> [email protected]
>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Linux-HA mailing list
>>>>>> [email protected]
>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Linux-HA mailing list
>>>>> [email protected]
>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>
>>>> _______________________________________________
>>>> Linux-HA mailing list
>>>> [email protected]
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>> See also: http://linux-ha.org/ReportingProblems
>>>>
>>>
>>> _______________________________________________
>>> Linux-HA mailing list
>>> [email protected]
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>

> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to