On 3/17/2010 at 10:20 AM, Tony Gan <[email protected]> wrote: > Hi, > I'm using heartbeat-3.0.0-33.2 and pacemaker-1.0.5-4.6 to create a two node > cluster. And both nodes connected to a shared storage device through Fibre > Channel through a FC switch. And I am going to use the shared storage as my > file system resource in cluster, I can mount the file system succesfully on > both nodes. > > Now I am trying to trigger a Fail-over after I unplug my FC cable from my > active node. > My expectation is that the file-system resource should failed and after a > failed-count it should fail-over and let my passive node take the resource. > > However, > It looks like OCF script of File-system did not handle this kind of > situation. Which is located in /usr/lib/ocf/resource.d/heartbeat/Filesystem > After I unplugged my FC cable, all file-system resources still started and > running fine. There's no additional logs in ha-log or ha-debug > > I can only find logs in system message log which I believe is kernel error > log about a I/O error on the file system (device sde is my shared storage): > Mar 15 22:17:24 node2 kernel: end_request: I/O error, dev sde, sector 12727 > Mar 15 22:17:24 node2 kernel: end_request: I/O error, dev sde, sector 12743 > > My question is, is there a way I can monitor the connectivity of my shared > storage through heartbeat? > I'm not familiar with storage network, what's the way to check the > connectivity? I was thinking if I can do this by using a similar way of > pingd.
The only way to be sure you've still got physical connectivity is to actually read and/or write data from/to the underlying block device, in direct mode, so that whatever you're reading won't be provided from some cache. This will necessarily have some performance impact during any monitor op (in particular, if your filesystem is otherwise heavily loaded). Anyway... Have a look at setting monitor depth=10 or depth=20 for your filesystem resource. The default monitor op just checks if the filesystem is mounted. Depth=10 will try to read 16 blocks off the target device, which will either fail or timeout if you're disconnected. Depth=20 will actually try to write then read a status file with each monitor op. HTH, Tim -- Tim Serong <[email protected]> Senior Clustering Engineer, OPS Engineering, Novell Inc. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
