Re: [Linux-HA] File-system resources still running on unplugged fibre channel

Tony Gan Mon, 22 Mar 2010 17:20:47 -0700

Hi Tim
Thanks for your advice.

Now I have modified and had something like this in my File-system in cib
config:
params device="/dev/sdd1" directory="/temp/tmp1" fstype="ext3" \
        op monitor interval="3s" timeout="10s" on_fail="fence" depth="20"


The file system is started on node2. Then I phisically unplug the only Fibre
Channel cable on this node.
My expectation is once the file-system failed, this node will get STONITHed,
because I unplug the FC cable on my node2.

However
on outputs of crm_mon -1
I still get this Filesystem started on node2:
    fs_res_sdd1      (ocf::heartbeat:Filesystem):    Started node2

Looks like the OCF monitoring script for the file-system is not running at
all in my assigned interval (every 3 seconds). And I did not find any error
log in ha-log, the file-system is just mounted
But in /var/log/messages, I am full of I/O errors of my mounted volumn.
Do you have any ideas?

Thanks


On Thu, Mar 18, 2010 at 2:51 AM, Tim Serong <[email protected]> wrote:

> On 3/17/2010 at 10:20 AM, Tony Gan <[email protected]> wrote:
> > Hi,
> > I'm using heartbeat-3.0.0-33.2 and pacemaker-1.0.5-4.6 to create a two
> node
> > cluster. And both nodes connected to a shared storage device through
> Fibre
> > Channel through a FC switch. And I am going to use the shared storage as
> my
> > file system resource in cluster, I can mount the file system succesfully
> on
> > both nodes.
> >
> > Now I am trying to trigger a Fail-over after I unplug my FC cable from my
> > active node.
> > My expectation is that the file-system resource should failed and after a
> > failed-count it should fail-over and let my passive node take the
> resource.
> >
> > However,
> > It looks like OCF script of File-system did not handle this kind of
> > situation. Which is located in
> /usr/lib/ocf/resource.d/heartbeat/Filesystem
> > After I unplugged my FC cable, all file-system resources still started
> and
> > running fine. There's no additional logs in ha-log or ha-debug
> >
> > I can only find logs in system message log which I believe is kernel
> error
> > log about a I/O error on the file system (device sde is my shared
> storage):
> > Mar 15 22:17:24 node2 kernel: end_request: I/O error, dev sde, sector
> 12727
> > Mar 15 22:17:24 node2 kernel: end_request: I/O error, dev sde, sector
> 12743
> >
> > My question is, is there a way I can monitor the connectivity of my
> shared
> > storage through heartbeat?
> > I'm not familiar with storage network, what's the way to check the
> > connectivity? I was thinking if I can do this by using a similar way of
> > pingd.
>
> The only way to be sure you've still got physical connectivity is to
> actually
> read and/or write data from/to the underlying block device, in direct mode,
> so that whatever you're reading won't be provided from some cache.  This
> will necessarily have some performance impact during any monitor op (in
> particular, if your filesystem is otherwise heavily loaded).
>
> Anyway...  Have a look at setting monitor depth=10 or depth=20 for your
> filesystem resource.  The default monitor op just checks if the filesystem
> is mounted.  Depth=10 will try to read 16 blocks off the target device,
> which will either fail or timeout if you're disconnected.  Depth=20 will
> actually try to write then read a status file with each monitor op.
>
> HTH,
>
> Tim
>
>
> --
> Tim Serong <[email protected]>
> Senior Clustering Engineer, OPS Engineering, Novell Inc.
>
>
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] File-system resources still running on unplugged fibre channel

Reply via email to