Re: [Linux-HA] Detecting drive failure and demoting

Dejan Muhamedagic Thu, 20 Dec 2007 04:55:45 -0800

Hi,

On Thu, Dec 20, 2007 at 08:15:12PM +0900, Trent Lloyd wrote:
> Hi Dejan,
> 
> On 20/12/2007, at 7:50 PM, Dejan Muhamedagic wrote:
> 
> >Hi,
> >
> >On Thu, Dec 20, 2007 at 05:48:10PM +0900, Trent Lloyd wrote:
> >>Hi All,
> >>
> >>I have recently setup a 2-node iSCSI fail-over array backed onto
> >>shared SAS MD3000 storage.
> >
> >How is this thing connected: is it iSCSI or SAS?
> 
> Sorry that wasn't clear - to the nodes running heartbeat they are  
> connected via SAS - they then serve them up via iSCSI.


OK.

> >
> >
> >>I have everything (including RDAC) working fine on my Debian Etch
> >>nodes - however I am curious if it is possible to get heartbeat to
> >>demote itself if it loses access to the disks - I am not sure if I am
> >>missing something but it seems if the disks start failing on a node
> >>there's no mechanism to cause it to failover.
> >
> >The kernel should take care of that. If the computer hangs or
> >crashes, there won't be heartbeat and, after a successful fencing
> >operation (you do have a stonith device, right?), a failover will
> >occur. You can also configure a watchdog. Or did I misunderstand
> >your question?
> 
> I would expect that if a single disk array disappears - the machine  
> shouldn't hang - only processes that were depending on those would  
> hang.  The same disk array does not contain the root array or anything  
> like that - only the data partition.

I guess that that depends on the kind of error. At any rate,
the processes which run on top of this disk will fail in some
way. If you have them in the heartbeat as resources and define a
monitor operation, then you should be OK.

> >>Is there anything to do this currently?I can't see anything.  I  
> >>figure
> >>it would be possible to write a plugin to monitor the dm-multipath
> >>stuff - is this a reasonable approach?
> >
> >It's been a long time since I used that. How can one monitor
> >dm-multipath? Isn't it fault tolerant?
> 
> It is, but I'm talking in a situation where for some reason both paths  
> are lost.  I know this seems kinda paranoid but it just seemed like a  
> reasonable thing to do to me.
> 
> Example output:
> filer2:~# multipath -ll
> mpath0 (360019b9000b6b68e00001c2a46e8e656) dm-0 DELL    ,MD3000
> [size=1.9T][features=0][hwhandler=1 rdac]
> \_ round-robin 0 [prio=3][enabled]
>  \_ 2:0:0:0  sdd 8:48  [active][ready]
> \_ round-robin 0 [prio=0][enabled]
>  \_ 1:0:0:0  sdb 8:16  [active][ghost]
> 
> So we could parse or write some API that makes the same call this make  
> to make sure that the mpath0 has at least 1 active working path.

Yes, it would be possible to do a monitor-only resource agent,
which would otherwise behave like a dummy resource (see Dummy :)
I just wonder how different that output can look and which
information is important. A more elegant way would be to
implement a ping-like monitor as a Heartbeat plugin. There are
already hbaping (for f/c) and ping (for IP).

Thanks,

Dejan

> Regards,
> Trent
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

-- 
Dejan
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Detecting drive failure and demoting

Reply via email to