What is fencing?
Fencing is the act of forecefully removing a node from a cluster. A node with
OCFS2 mounted will fence itself when it realizes that it doesn't have quorum in
a degraded cluster. It does this so that other nodes won't get stuck trying to
access its resources. Currently OCFS2 will panic the machine when it realizes
it has to fence itself off from the cluster. As described above, it will do
this when it sees more nodes heartbeating than it has connectivity to and fails
the quorum test.
Due to user reports of nodes hanging during fencing, OCFS2 1.2.5 no longer uses
"panic" for fencing. Instead, by default, it uses "machine restart". This
should not only prevent nodes from hanging during fencing but also allow for
nodes to quickly restart and rejoin the cluster. While this change is internal
in nature, we are documenting this so as to make users aware that they are no
longer going to see the familiar panic stack trace during fencing. Instead they
will see the message"*** ocfs2 is very sorry to be fencing this system by
restarting ***" and that too probably only as part of the messages captured on
the netdump/netconsole server.
If perchance the user wishes to use panic to fence (maybe to see the familiar
oops stack trace or on the advise of customer support to diagnose frequent
reboots), one can do so by issuing the following command after the O2CB cluster
is online.
# echo 1 > /proc/fs/ocfs2_nodemanager/fence_method
Please note that this change is local to a node.
At 2011-04-10 22:29:01,"Meisam Mohammadkhani" <[email protected]>
wrote:
Hi All,
I'm new to GFS. I'm searching around a solution for our enterprise application
that is responsible to save(and manipulate) historical data of industrial
devices. Now, we have two stations that works like hot redundant of each other.
Our challenge is in case of failure. For now, our application is responsible to
handling fault by synchronizing the files that changed during the fault, by
itself. Our application is running on two totally independent machines (one as
redundant) and so each one has its own disk.
We are searching around a solution like a "high available transparent file
system" that makes the fault transparent to the application, so in case of
fault, redundant machine still can access the files even the master machine is
down (replica issue or such a thing).
Is there fail-over feature in GFS that satisfy our requirement? Actually, my
question is that can GFS help us in our case?
Regards
--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster