[CentOS] problem with gfs_controld

2010-09-15 Thread sandra-llistes
Hi,

We have two nodes with centos 5.5 x64 and cluster+gfs offering samba and
NFS services.
Recently one node displayed the following messages in log files:

Sep 13 08:19:07 NODE1 gfs_controld[3101]: cpg_mcast_joined error 2
handle 2846d7ad MSG_PLOCK
Sep 13 08:19:07 NODE1 gfs_controld[3101]: send plock message error -1
Sep 13 08:19:11 NODE1 gfs_controld[3101]: cpg_mcast_joined error 2
handle 2846d7ad MSG_PLOCK
Sep 13 08:19:11 NODE1 gfs_controld[3101]: send plock message error -1

When this happens in the other node access to samba services begin to
freeze and this error appears:

Sep 13 08:08:22 NODE2 kernel: INFO: task smbd:23084 blocked for more
than 120 seconds.
Sep 13 08:08:22 NODE2 kernel: echo 0 
/proc/sys/kernel/hung_task_timeout_secs disables this message.
Sep 13 08:08:22 NODE2 kernel: smbd  D 810001576420 0
23084   6602 23307 19791 (NOTLB)
Sep 13 08:08:22 NODE2 kernel:  81003e187e08 0086
81003e187e24 0092
Sep 13 08:08:22 NODE2 kernel:  810005dbdc38 000a
81003f4f77a0 80309b60
Sep 13 08:08:22 NODE2 kernel:  62f1773ef4c3 624f
81003f4f7988 8008c597
Sep 13 08:08:22 NODE2 kernel: Call Trace:
Sep 13 08:08:22 NODE2 kernel:  [8875cb7d]
:dlm:dlm_posix_lock+0x172/0x210
Sep 13 08:08:22 NODE2 kernel:  [800a1ba4]
autoremove_wake_function+0x0/0x2e
Sep 13 08:08:22 NODE2 kernel:  [8882a5b9] :gfs:gfs_lock+0x9c/0xa8
Sep 13 08:08:22 NODE2 kernel:  [8003a142] fcntl_setlk+0x11e/0x273
Sep 13 08:08:22 NODE2 kernel:  [800b878c]
audit_syscall_entry+0x180/0x1b3
Sep 13 08:08:22 NODE2 kernel:  [8002e7da] sys_fcntl+0x269/0x2dc
Sep 13 08:08:22 NODE2 kernel:  [8005e28d] tracesys+0xd5/0xe0

The configuration of the cluster is the following:

?xml version=1.0?
cluster alias=lcfib config_version=60 name=lcfib
quorumd device=/dev/gfs-webn/quorum interval=1
label=quorum min_score=1 tko=10 votes=2
heuristic interval=10 program=/bin/ping -t1 -c1
numIP.1 score=1 tko=5/
/quorumd
fence_daemon post_fail_delay=0 post_join_delay=3/
clusternodes
clusternode name=NODE2.fib.upc.es nodeid=1 votes=1
fence
method name=1
device lanplus=1 name=NODE2SP/
/method
/fence
/clusternode
clusternode name=NODE1.fib.upc.es nodeid=2 votes=1
fence
method name=1
device lanplus=1 name=NODE1SP/
/method
/fence
/clusternode
/clusternodes
cman broadcast=yes expected_votes=4 two_node=0/
fencedevices
fencedevice agent=fence_ipmilan auth=md5
ipaddr=192.168.13.77 login= name=NODE2SP passwd=5jSTv3Mb/
fencedevice agent=fence_ipmilan auth=md5
ipaddr=192.168.13.78 login= name=NODE1SP passwd=5jSTv3Mb/
/fencedevices
rm
failoverdomains
failoverdomain name=NODE1-NODE2 ordered=1
restricted=1
failoverdomainnode
name=NODE2.fib.upc.es priority=2/
failoverdomainnode
name=NODE1.fib.upc.es priority=1/
/failoverdomain
failoverdomain name=NODE2-NODE1 ordered=1
restricted=1
failoverdomainnode
name=NODE2.fib.upc.es priority=1/
failoverdomainnode
name=NODE1.fib.upc.es priority=2/
/failoverdomain
/failoverdomains
resources
script file=/etc/init.d/fibsmb1 name=fibsmb1/
script file=/etc/init.d/fibsmb2 name=fibsmb2/
clusterfs device=/dev/gfs-webn/gfs-webn
force_unmount=0 fsid=14417 fstype=gfs mountpoint=/web name=web
options=/
clusterfs device=/dev/gfs-perfils/gfs-assig
force_unmount=0 fsid=21646 fstype=gfs mountpoint=/assig
name=assig options=/
smb name=FIBSMB1 workgroup=FIBSMB/
smb name=FIBSMB2 workgroup=FIBSMB/
ip address=numIP.111/24 monitor_link=1/
ip address=numIP.110/24 monitor_link=1/
ip address=numIP.112/24 monitor_link=1/
/resources
service autostart=1 domain=NODE2-NODE1 name=samba
recovery=disable
clusterfs ref=web/
ip ref=numIP.110/24/
ip ref=numIP.112/24/
clusterfs ref=assig/
script ref=fibsmb2/
smb ref=FIBSMB2/
/service
service 

Re: [CentOS] mounting gfs partition hangs; Solution

2010-03-31 Thread sandra-llistes
It seems that all the problem was caused by selinux.
I booted all the nodes with the kernel option selinux=0 and now I can
mount GFS partitions without problems.
Thanks,

Sandra

sandra-llistes wrote:
 Hi,
 
 I'm not a GFS expert either :-)
 In fact, I erased all the configuration of the cluster and try to do it
 again from scratch with luci/ricci configuration tools (perhaps I did
 something wrong last time)
 
 mount -vv gives the following information:
 [r...@node1 gfs]# mount -t gfs -vv /dev/home2/home2 /home2
 /sbin/mount.gfs: mount /dev/mapper/home2-home2 /home2
 /sbin/mount.gfs: parse_opts: opts = rw
 /sbin/mount.gfs:   clear flag 1 for rw, flags = 0
 /sbin/mount.gfs: parse_opts: flags = 0
 /sbin/mount.gfs: parse_opts: extra = 
 /sbin/mount.gfs: parse_opts: hostdata = 
 /sbin/mount.gfs: parse_opts: lockproto = 
 /sbin/mount.gfs: parse_opts: locktable = 
 /sbin/mount.gfs: message to gfs_controld: asking to join mountgroup:
 /sbin/mount.gfs: write join /home2 gfs lock_dlm gfs-test:gfs-data rw
 /dev/mapper/home2-home2
 ...
 
 And hangs at that point (On the other node it happens the same)
 
 I tried it turning off the local firewalls on the nodes and they reached
 each other without problem with pings. Also, there are no more firewalls
 between them.
 
 The new configuration is more simple:
 [r...@node1 gfs]# more /etc/cluster/cluster.conf
 ?xml version=1.0?
 cluster alias=gfs-test config_version=6 name=gfs-test
 fence_daemon clean_start=0 post_fail_delay=0
 post_join_delay=3/
 clusternodes
 clusternode name=node1.fib.upc.es nodeid=1 votes=1
 fence
 method name=1
 device name=test
 nodename=node1.fib.upc.es/
 /method
 /fence
 /clusternode
 clusternode name=node2.fib.upc.es nodeid=2 votes=1
 fence
 method name=1
 device name=test
 nodename=node2.fib.upc.es/
 device name=test
 nodename=node2.fib.upc.es/
 /method
 /fence
 /clusternode
 /clusternodes
 cman expected_votes=1 two_node=1/
 fencedevices
 fencedevice agent=fence_manual name=test/
 /fencedevices
 rm
 failoverdomains/
 resources
 clusterfs device=/dev/home2/home2
 force_unmount=0 fsid=3280 fstype=gfs mountpoint=/home2
 name=home self_fence=0/
 /resources
 /rm
 /cluster
 
 Finally, I reformatted /dev/home2/home2 with the following command that
 gave no errors but it doesn't affect the final result:
 gfs_mkfs -O -j 3 -p lock_dlm -t gfs-test:gfs-data /dev/home2/home2
 
 Thanks,
 
 Sandra
 
 PD: I append an strace but I can't see anything useful.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] mounting gfs partition hangs

2010-03-26 Thread sandra-llistes
Hi,

I'm not a GFS expert either :-)
In fact, I erased all the configuration of the cluster and try to do it
again from scratch with luci/ricci configuration tools (perhaps I did
something wrong last time)

mount -vv gives the following information:
[r...@node1 gfs]# mount -t gfs -vv /dev/home2/home2 /home2
/sbin/mount.gfs: mount /dev/mapper/home2-home2 /home2
/sbin/mount.gfs: parse_opts: opts = rw
/sbin/mount.gfs:   clear flag 1 for rw, flags = 0
/sbin/mount.gfs: parse_opts: flags = 0
/sbin/mount.gfs: parse_opts: extra = 
/sbin/mount.gfs: parse_opts: hostdata = 
/sbin/mount.gfs: parse_opts: lockproto = 
/sbin/mount.gfs: parse_opts: locktable = 
/sbin/mount.gfs: message to gfs_controld: asking to join mountgroup:
/sbin/mount.gfs: write join /home2 gfs lock_dlm gfs-test:gfs-data rw
/dev/mapper/home2-home2
...

And hangs at that point (On the other node it happens the same)

I tried it turning off the local firewalls on the nodes and they reached
each other without problem with pings. Also, there are no more firewalls
between them.

The new configuration is more simple:
[r...@node1 gfs]# more /etc/cluster/cluster.conf
?xml version=1.0?
cluster alias=gfs-test config_version=6 name=gfs-test
fence_daemon clean_start=0 post_fail_delay=0
post_join_delay=3/
clusternodes
clusternode name=node1.fib.upc.es nodeid=1 votes=1
fence
method name=1
device name=test
nodename=node1.fib.upc.es/
/method
/fence
/clusternode
clusternode name=node2.fib.upc.es nodeid=2 votes=1
fence
method name=1
device name=test
nodename=node2.fib.upc.es/
device name=test
nodename=node2.fib.upc.es/
/method
/fence
/clusternode
/clusternodes
cman expected_votes=1 two_node=1/
fencedevices
fencedevice agent=fence_manual name=test/
/fencedevices
rm
failoverdomains/
resources
clusterfs device=/dev/home2/home2
force_unmount=0 fsid=3280 fstype=gfs mountpoint=/home2
name=home self_fence=0/
/resources
/rm
/cluster

Finally, I reformatted /dev/home2/home2 with the following command that
gave no errors but it doesn't affect the final result:
gfs_mkfs -O -j 3 -p lock_dlm -t gfs-test:gfs-data /dev/home2/home2

Thanks,

Sandra

PD: I append an strace but I can't see anything useful.
[r...@node1 gfs]# strace mount /home2
execve(/bin/mount, [mount, /home2], [/* 17 vars */]) = 0
brk(0)  = 0x9874000
access(/etc/ld.so.preload, R_OK)  = -1 ENOENT (No such file or directory)
open(/etc/ld.so.cache, O_RDONLY)  = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=26154, ...}) = 0
mmap2(NULL, 26154, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f51000
close(3)= 0
open(/lib/libblkid.so.1, O_RDONLY)= 3
read(3, \177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\300 O\0004\0\0\0..., 
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=38620, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0xb7f5
mmap2(0x4f, 4, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x4f
mmap2(0x4f9000, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8) = 0x4f9000
close(3)= 0
open(/lib/libuuid.so.1, O_RDONLY) = 3
read(3, 
\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\316V\0004\0\0\0..., 512) 
= 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=15704, ...}) = 0
mmap2(0x56c000, 12792, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x56c000
mmap2(0x56f000, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3) = 0x56f000
close(3)= 0
open(/lib/libselinux.so.1, O_RDONLY)  = 3
read(3, 
\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\\245T\0004\0\0\0..., 512) = 
512
fstat64(3, {st_mode=S_IFREG|0755, st_size=93508, ...}) = 0
mmap2(0x547000, 97112, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x547000
mmap2(0x55d000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15) = 0x55d000
close(3)= 0
open(/lib/libc.so.6, O_RDONLY)= 3
read(3, \177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340_8\0004\0\0\0..., 
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1611564, ...}) = 0
mmap2(0x37, 1328580, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) 
= 0x37
mmap2(0x4af000, 12288, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x13f) = 0x4af000
mmap2(0x4b2000, 9668, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 

[CentOS] mounting gfs partition hangs

2010-03-24 Thread sandra-llistes
Hi,

I have configured two machines for testing gfs filesystems. They are
attached to a iscsi device and centos versions are:
CentOS release 5.4 (Final)
Linux node1.fib.upc.es 2.6.18-164.el5 #1 SMP Thu Sep 3 03:33:56 EDT 2009
i686 i686 i386 GNU/Linux

The problem is if I try to mount a gfs partition it hangs.

[r...@node2 ~]#  cman_tool status
Version: 6.2.0
Config Version: 29
Cluster Name: gfs-test
Cluster Id: 25790
Cluster Member: Yes
Cluster Generation: 4156
Membership state: Cluster-Member
Nodes: 2
Expected votes: 2
Quorum device votes: 2
Total votes: 4
Quorum: 3
Active subsystems: 9
Flags:
Ports Bound: 0
Node name: node2.fib.upc.es
Node ID: 2
Multicast addresses: 239.192.100.35
Node addresses: 147.83.41.130

[r...@node2 ~]# cman_tool nodes
Node  Sts   Inc   Joined   Name
   0   M  0   2010-03-24 14:46:22  /dev/web/web
   1   M   4156   2010-03-24 17:08:36  node1.fib.upc.es
   2   M   4132   2010-03-24 14:46:09  node2.fib.upc.es

[r...@node2 ~]# group_tool
hangs...

[r...@node1 ~]# mount -t gfs /dev/home2/home2 /home2
hangs...

If I cancel the command I can return to the terminal and I don't see
anything in log files.
The resource /dev/home2/home2 is accessible by the two nodes and if I
try to mount /home2 with lock_nolock there is no problem.

cluster.conf is:
?xml version=1.0?
cluster alias=gfs-test config_version=29 name=gfs-test
quorumd device=/dev/web/web interval=1 min_score=1
tko=10 votes=2
heuristic interval=10 program=/bin/ping -t1 -c1
147.83.41.1 score=1/
/quorumd
fence_daemon post_fail_delay=0 post_join_delay=3/
clusternodes
clusternode name=node1.fib.upc.es nodeid=1 votes=1
fence
method name=1
device name=gfs-test
nodename=node1.fib.upc.es/
/method
/fence
/clusternode
clusternode name=node2.fib.upc.es nodeid=2 votes=1
fence
method name=1
device name=gfs-test
nodename=node2.fib.upc.es/
/method
/fence
/clusternode
/clusternodes
cman/
fencedevices
fencedevice agent=fence_manual name=gfs-test/
/fencedevices
rm
resources
clusterfs device=/dev/home2/home2
force_unmount=0 fsid=1605 fstype=gfs mountpoint=/home2
name=home alumnes options=/
/resources
/rm
/cluster

Any help will be welcomed.
Thanks,

Sandra
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos