It went pretty well now. Thanks a lot for your help. I set interval to 10s
and timeout to 20s now.
Also it may be worth to mention, in RA 1.0-32.2, the depth-level of OCF
Filesystem were not implemented.
But it is impletmentd in the latest 1.0.2 resource agent, Might be a legacy
version problem.

Regards

On Mon, Mar 22, 2010 at 10:20 PM, Tim Serong <[email protected]> wrote:

> On 3/23/2010 at 11:20 AM, Tony Gan <[email protected]> wrote:
> > Hi Tim
> > Thanks for your advice.
> >
> > Now I have modified and had something like this in my File-system in cib
> > config:
> > params device="/dev/sdd1" directory="/temp/tmp1" fstype="ext3" \
> >         op monitor interval="3s" timeout="10s" on_fail="fence" depth="20"
> >
> > The file system is started on node2. Then I phisically unplug the only
> Fibre
> > Channel cable on this node.
> > My expectation is once the file-system failed, this node will get
> STONITHed,
> > because I unplug the FC cable on my node2.
> >
> > However
> > on outputs of crm_mon -1
> > I still get this Filesystem started on node2:
> >     fs_res_sdd1      (ocf::heartbeat:Filesystem):    Started node2
> >
> > Looks like the OCF monitoring script for the file-system is not running
> at
> > all in my assigned interval (every 3 seconds). And I did not find any
> error
> > log in ha-log, the file-system is just mounted
> > But in /var/log/messages, I am full of I/O errors of my mounted volumn.
> > Do you have any ideas?
>
> Sorry, my bad, that should have been:
>
>  op monitor ... OCF_CHECK_LEVEL="20"
>
> The Filesystem RA doesn't log anything for successful monitor ops, so you
> won't actually see any noise in the logs during normal operation.
>
> Once it's working to your satisfaction you may also want to experiment with
> putting the filesystem under heavy read/write load - it's possible that if
> the filesystem is under severe enough load, that 3 seconds may be too
> frequent
> a monitor, and/or 10 seconds too short a timeout, because the IO will be
> blocking due to the heavy load.  The last thing you want is fencing due to
> monitor op timeout when nothing is actually broken, but the only way to be
> sure is to test.
>
> Regards,
>
> Tim
>
> >
> > Thanks
> >
> >
> > On Thu, Mar 18, 2010 at 2:51 AM, Tim Serong <[email protected]> wrote:
> >
> > > On 3/17/2010 at 10:20 AM, Tony Gan <[email protected]> wrote:
> > > > Hi,
> > > > I'm using heartbeat-3.0.0-33.2 and pacemaker-1.0.5-4.6 to create a
> two
> > > node
> > > > cluster. And both nodes connected to a shared storage device through
> > > Fibre
> > > > Channel through a FC switch. And I am going to use the shared storage
> as
> > > my
> > > > file system resource in cluster, I can mount the file system
> succesfully
> > > on
> > > > both nodes.
> > > >
> > > > Now I am trying to trigger a Fail-over after I unplug my FC cable
> from my
> > > > active node.
> > > > My expectation is that the file-system resource should failed and
> after a
> > > > failed-count it should fail-over and let my passive node take the
> > > resource.
> > > >
> > > > However,
> > > > It looks like OCF script of File-system did not handle this kind of
> > > > situation. Which is located in
> > > /usr/lib/ocf/resource.d/heartbeat/Filesystem
> > > > After I unplugged my FC cable, all file-system resources still
> started
> > > and
> > > > running fine. There's no additional logs in ha-log or ha-debug
> > > >
> > > > I can only find logs in system message log which I believe is kernel
> > > error
> > > > log about a I/O error on the file system (device sde is my shared
> > > storage):
> > > > Mar 15 22:17:24 node2 kernel: end_request: I/O error, dev sde, sector
> > > 12727
> > > > Mar 15 22:17:24 node2 kernel: end_request: I/O error, dev sde, sector
> > > 12743
> > > >
> > > > My question is, is there a way I can monitor the connectivity of my
> > > shared
> > > > storage through heartbeat?
> > > > I'm not familiar with storage network, what's the way to check the
> > > > connectivity? I was thinking if I can do this by using a similar way
> of
> > > > pingd.
> > >
> > > The only way to be sure you've still got physical connectivity is to
> > > actually
> > > read and/or write data from/to the underlying block device, in direct
> mode,
> > > so that whatever you're reading won't be provided from some cache.
>  This
> > > will necessarily have some performance impact during any monitor op (in
> > > particular, if your filesystem is otherwise heavily loaded).
> > >
> > > Anyway...  Have a look at setting monitor depth=10 or depth=20 for your
> > > filesystem resource.  The default monitor op just checks if the
> filesystem
> > > is mounted.  Depth=10 will try to read 16 blocks off the target device,
> > > which will either fail or timeout if you're disconnected.  Depth=20
> will
> > > actually try to write then read a status file with each monitor op.
> > >
> > > HTH,
> > >
> > > Tim
> > >
> > >
> > > --
> > > Tim Serong <[email protected]>
> > > Senior Clustering Engineer, OPS Engineering, Novell Inc.
> > >
> > >
> > >
> > > _______________________________________________
> > > Linux-HA mailing list
> > > [email protected]
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > >
> >
>
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to