Greetings,
I am trying to set up a cluster with (for now) two nodes, reason being
the semantic guarantees of GFS when accessing shared files (that is, I
am not interested in fault tolerance, performance or anything else).
Unfortunately, I keep running into all sorts of problems, for
example:
- After a few hours of intensive workload, the cluster sometimes
simply stops. All file system calls block, but things like cman_tool
status or group_tool status insist everything is all right. Soft reboot
is not possible due to various services waiting infinitely, after power
cycling fsck finds inconsistencies on the file system.
- Sometimes, when trying to execute a binary on the file system, I get
execvp returning permission denied when it should not, but when I try
again, everything is all right. I sometimes even observe this when
trying to start a script on the file system, as if the interpreter of
the script (which is on a different file system altogether) had wrong
permissions. Again, simply trying one more time makes everything work.
The config of the cluster seems relatively simple:
- i686 single CPU node
- file system device accessible over iSCSI
- cluster subnet (unfortunately) connected over OpenVPN
- x86_64 eight CPU virtual node
- file system device provided by host which uses iSCSI
- both nodes resolve into the same subnet using /etc/hosts
- nothing except a single GFS2 file system is mounted
- fencing uses fence_manual
- both nodes run Fedora 8
Config attached, not like there is anything unusual in it.
As an absolute novice, I am probably making some glaringly obvious silly
mistake which results in the very weird behavior described above, but
try as I might, I do not see anything that can cause this ?
Thanks for any advice, Petr
<?xml version="1.0" ?>
<cluster config_version="1" name="monoton">
<fence_daemon post_fail_delay="-1" post_join_delay="-1"/>
<clusternodes>
<clusternode name="delta.dsrb" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="Fencer" nodename="delta.dsrb"/>
</method>
</fence>
</clusternode>
<clusternode name="ichi.dsrb" nodeid="101" votes="1">
<fence>
<method name="1">
<device name="Fencer" nodename="ichi.dsrb"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_manual" name="Fencer"/>
</fencedevices>
<rm>
<failoverdomains/>
<resources/>
</rm>
</cluster>
--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster