Re: [Linux-HA] Detecting drive failure and demoting

Dejan Muhamedagic Thu, 20 Dec 2007 06:12:06 -0800

Hi,

On Thu, Dec 20, 2007 at 10:47:02PM +0900, Trent Lloyd wrote:
> Hi Dejan,
>
> On 20/12/2007, at 9:54 PM, Dejan Muhamedagic wrote:
>
>> Hi,
>>
>> On Thu, Dec 20, 2007 at 08:15:12PM +0900, Trent Lloyd wrote:
>>> Hi Dejan,
>>>
>>> On 20/12/2007, at 7:50 PM, Dejan Muhamedagic wrote:
>>>
>>>> Hi,
>>>>
>>>> On Thu, Dec 20, 2007 at 05:48:10PM +0900, Trent Lloyd wrote:
>>>>> Hi All,
>>>>>
>>>>> I have recently setup a 2-node iSCSI fail-over array backed onto
>>>>> shared SAS MD3000 storage.
>>>>
>>>> How is this thing connected: is it iSCSI or SAS?
>>>
>>> Sorry that wasn't clear - to the nodes running heartbeat they are
>>> connected via SAS - they then serve them up via iSCSI.
>>
>> OK.
>>
>>>>
>>>>
>>>>> I have everything (including RDAC) working fine on my Debian Etch
>>>>> nodes - however I am curious if it is possible to get heartbeat to
>>>>> demote itself if it loses access to the disks - I am not sure if I am
>>>>> missing something but it seems if the disks start failing on a node
>>>>> there's no mechanism to cause it to failover.
>>>>
>>>> The kernel should take care of that. If the computer hangs or
>>>> crashes, there won't be heartbeat and, after a successful fencing
>>>> operation (you do have a stonith device, right?), a failover will
>>>> occur. You can also configure a watchdog. Or did I misunderstand
>>>> your question?
>>>
>>> I would expect that if a single disk array disappears - the machine
>>> shouldn't hang - only processes that were depending on those would
>>> hang.  The same disk array does not contain the root array or anything
>>> like that - only the data partition.
>>
>> I guess that that depends on the kind of error. At any rate,
>> the processes which run on top of this disk will fail in some
>> way. If you have them in the heartbeat as resources and define a
>> monitor operation, then you should be OK.
>>
>>
>>>>> Is there anything to do this currently?I can't see anything.  I
>>>>> figure
>>>>> it would be possible to write a plugin to monitor the dm-multipath
>>>>> stuff - is this a reasonable approach?
>>>>
>>>> It's been a long time since I used that. How can one monitor
>>>> dm-multipath? Isn't it fault tolerant?
>>>
>>> It is, but I'm talking in a situation where for some reason both paths
>>> are lost.  I know this seems kinda paranoid but it just seemed like a
>>> reasonable thing to do to me.
>>>
>>> Example output:
>>> filer2:~# multipath -ll
>>> mpath0 (360019b9000b6b68e00001c2a46e8e656) dm-0 DELL    ,MD3000
>>> [size=1.9T][features=0][hwhandler=1 rdac]
>>> \_ round-robin 0 [prio=3][enabled]
>>> \_ 2:0:0:0  sdd 8:48  [active][ready]
>>> \_ round-robin 0 [prio=0][enabled]
>>> \_ 1:0:0:0  sdb 8:16  [active][ghost]
>>>
>>> So we could parse or write some API that makes the same call this make
>>> to make sure that the mpath0 has at least 1 active working path.
>>
>> Yes, it would be possible to do a monitor-only resource agent,
>> which would otherwise behave like a dummy resource (see Dummy :)
>> I just wonder how different that output can look and which
>> information is important. A more elegant way would be to
>> implement a ping-like monitor as a Heartbeat plugin. There are
>> already hbaping (for f/c) and ping (for IP).
>
> It's worth noting I am using heartbeat in v1 mode rather than CRM mode.


This is a Heartbeat feature and it will work in both v1 and v2.

> But alas I can still do something like this I think.. I will look into it.

Contributions are welcome :)

Thanks,

Dejan

> Thanks,
> Trent
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Detecting drive failure and demoting

Reply via email to