Hi,

On Sun, Jul 05, 2015 at 09:13:56PM +0500, Muhammad Sharfuddin wrote:
> SLES 11 SP 3 + online updates(pacemaker-1.1.11-0.8.11.70
> openais-1.1.4-5.22.1.7)
> 
> Its a dual primary drbd cluster, which mounts a file system resource
> on both the cluster nodes simultaneously(file system type is ocfs2).
> 
> Whenever any of the nodes goes down, the file system(/sharedata)
> become inaccessible for exact 35 seconds on the other
> (surviving/online) node, and then become available again on the
> online node.
> 
> Please help me understand why the node which survives or remains
> online unable to access the file system resource(/sharedata) for 35
> seconds ? and how can I fix the cluster so that file system remains
> accessible on the surviving node without any interruption/delay(as
> in my case of about 35 seconds)
> 
> By inaccessible, I meant to say that running "ls -l /sharedata" and
> "df /sharedata" does not return any output and does not return the
> prompt back on the online node for exact 35 seconds once the other
> node becomes offline.
> 
> e.g "node1" got offline somewhere around  01:37:15, and then
> /sharedata file system was inaccessible during 01:37:35 and 01:38:18
> on the online node i.e "node2".

Before the failing node gets fenced you won't be able to use the
ocfs2 filesystem. In this case, the fencing operation takes 40
seconds:

> [...]
> Jul  5 01:37:35 node2 sbd: [6197]: info: Writing reset to node slot node1
> Jul  5 01:37:35 node2 sbd: [6197]: info: Messaging delay: 40
> Jul  5 01:38:15 node2 sbd: [6197]: info: reset successfully
> delivered to node1
> Jul  5 01:38:15 node2 sbd: [6196]: info: Message successfully delivered.
> [...]

You may want to reduce that sbd timeout.

Thanks,

Dejan
_______________________________________________
Linux-HA mailing list is closing down.
Please subscribe to us...@clusterlabs.org instead.
http://clusterlabs.org/mailman/listinfo/users
_______________________________________________
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha

Reply via email to