[CentOS] problem with gfs_controld
Hi, We have two nodes with centos 5.5 x64 and cluster+gfs offering samba and NFS services. Recently one node displayed the following messages in log files: Sep 13 08:19:07 NODE1 gfs_controld[3101]: cpg_mcast_joined error 2 handle 2846d7ad MSG_PLOCK Sep 13 08:19:07 NODE1 gfs_controld[3101]: send plock message error -1 Sep 13 08:19:11 NODE1 gfs_controld[3101]: cpg_mcast_joined error 2 handle 2846d7ad MSG_PLOCK Sep 13 08:19:11 NODE1 gfs_controld[3101]: send plock message error -1 When this happens in the other node access to samba services begin to freeze and this error appears: Sep 13 08:08:22 NODE2 kernel: INFO: task smbd:23084 blocked for more than 120 seconds. Sep 13 08:08:22 NODE2 kernel: echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Sep 13 08:08:22 NODE2 kernel: smbd D 810001576420 0 23084 6602 23307 19791 (NOTLB) Sep 13 08:08:22 NODE2 kernel: 81003e187e08 0086 81003e187e24 0092 Sep 13 08:08:22 NODE2 kernel: 810005dbdc38 000a 81003f4f77a0 80309b60 Sep 13 08:08:22 NODE2 kernel: 62f1773ef4c3 624f 81003f4f7988 8008c597 Sep 13 08:08:22 NODE2 kernel: Call Trace: Sep 13 08:08:22 NODE2 kernel: [8875cb7d] :dlm:dlm_posix_lock+0x172/0x210 Sep 13 08:08:22 NODE2 kernel: [800a1ba4] autoremove_wake_function+0x0/0x2e Sep 13 08:08:22 NODE2 kernel: [8882a5b9] :gfs:gfs_lock+0x9c/0xa8 Sep 13 08:08:22 NODE2 kernel: [8003a142] fcntl_setlk+0x11e/0x273 Sep 13 08:08:22 NODE2 kernel: [800b878c] audit_syscall_entry+0x180/0x1b3 Sep 13 08:08:22 NODE2 kernel: [8002e7da] sys_fcntl+0x269/0x2dc Sep 13 08:08:22 NODE2 kernel: [8005e28d] tracesys+0xd5/0xe0 The configuration of the cluster is the following: ?xml version=1.0? cluster alias=lcfib config_version=60 name=lcfib quorumd device=/dev/gfs-webn/quorum interval=1 label=quorum min_score=1 tko=10 votes=2 heuristic interval=10 program=/bin/ping -t1 -c1 numIP.1 score=1 tko=5/ /quorumd fence_daemon post_fail_delay=0 post_join_delay=3/ clusternodes clusternode name=NODE2.fib.upc.es nodeid=1 votes=1 fence method name=1 device lanplus=1 name=NODE2SP/ /method /fence /clusternode clusternode name=NODE1.fib.upc.es nodeid=2 votes=1 fence method name=1 device lanplus=1 name=NODE1SP/ /method /fence /clusternode /clusternodes cman broadcast=yes expected_votes=4 two_node=0/ fencedevices fencedevice agent=fence_ipmilan auth=md5 ipaddr=192.168.13.77 login= name=NODE2SP passwd=5jSTv3Mb/ fencedevice agent=fence_ipmilan auth=md5 ipaddr=192.168.13.78 login= name=NODE1SP passwd=5jSTv3Mb/ /fencedevices rm failoverdomains failoverdomain name=NODE1-NODE2 ordered=1 restricted=1 failoverdomainnode name=NODE2.fib.upc.es priority=2/ failoverdomainnode name=NODE1.fib.upc.es priority=1/ /failoverdomain failoverdomain name=NODE2-NODE1 ordered=1 restricted=1 failoverdomainnode name=NODE2.fib.upc.es priority=1/ failoverdomainnode name=NODE1.fib.upc.es priority=2/ /failoverdomain /failoverdomains resources script file=/etc/init.d/fibsmb1 name=fibsmb1/ script file=/etc/init.d/fibsmb2 name=fibsmb2/ clusterfs device=/dev/gfs-webn/gfs-webn force_unmount=0 fsid=14417 fstype=gfs mountpoint=/web name=web options=/ clusterfs device=/dev/gfs-perfils/gfs-assig force_unmount=0 fsid=21646 fstype=gfs mountpoint=/assig name=assig options=/ smb name=FIBSMB1 workgroup=FIBSMB/ smb name=FIBSMB2 workgroup=FIBSMB/ ip address=numIP.111/24 monitor_link=1/ ip address=numIP.110/24 monitor_link=1/ ip address=numIP.112/24 monitor_link=1/ /resources service autostart=1 domain=NODE2-NODE1 name=samba recovery=disable clusterfs ref=web/ ip ref=numIP.110/24/ ip ref=numIP.112/24/ clusterfs ref=assig/ script ref=fibsmb2/ smb ref=FIBSMB2/ /service service
Re: [CentOS] mounting gfs partition hangs; Solution
It seems that all the problem was caused by selinux. I booted all the nodes with the kernel option selinux=0 and now I can mount GFS partitions without problems. Thanks, Sandra sandra-llistes wrote: Hi, I'm not a GFS expert either :-) In fact, I erased all the configuration of the cluster and try to do it again from scratch with luci/ricci configuration tools (perhaps I did something wrong last time) mount -vv gives the following information: [r...@node1 gfs]# mount -t gfs -vv /dev/home2/home2 /home2 /sbin/mount.gfs: mount /dev/mapper/home2-home2 /home2 /sbin/mount.gfs: parse_opts: opts = rw /sbin/mount.gfs: clear flag 1 for rw, flags = 0 /sbin/mount.gfs: parse_opts: flags = 0 /sbin/mount.gfs: parse_opts: extra = /sbin/mount.gfs: parse_opts: hostdata = /sbin/mount.gfs: parse_opts: lockproto = /sbin/mount.gfs: parse_opts: locktable = /sbin/mount.gfs: message to gfs_controld: asking to join mountgroup: /sbin/mount.gfs: write join /home2 gfs lock_dlm gfs-test:gfs-data rw /dev/mapper/home2-home2 ... And hangs at that point (On the other node it happens the same) I tried it turning off the local firewalls on the nodes and they reached each other without problem with pings. Also, there are no more firewalls between them. The new configuration is more simple: [r...@node1 gfs]# more /etc/cluster/cluster.conf ?xml version=1.0? cluster alias=gfs-test config_version=6 name=gfs-test fence_daemon clean_start=0 post_fail_delay=0 post_join_delay=3/ clusternodes clusternode name=node1.fib.upc.es nodeid=1 votes=1 fence method name=1 device name=test nodename=node1.fib.upc.es/ /method /fence /clusternode clusternode name=node2.fib.upc.es nodeid=2 votes=1 fence method name=1 device name=test nodename=node2.fib.upc.es/ device name=test nodename=node2.fib.upc.es/ /method /fence /clusternode /clusternodes cman expected_votes=1 two_node=1/ fencedevices fencedevice agent=fence_manual name=test/ /fencedevices rm failoverdomains/ resources clusterfs device=/dev/home2/home2 force_unmount=0 fsid=3280 fstype=gfs mountpoint=/home2 name=home self_fence=0/ /resources /rm /cluster Finally, I reformatted /dev/home2/home2 with the following command that gave no errors but it doesn't affect the final result: gfs_mkfs -O -j 3 -p lock_dlm -t gfs-test:gfs-data /dev/home2/home2 Thanks, Sandra PD: I append an strace but I can't see anything useful. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] mounting gfs partition hangs
Hi, I'm not a GFS expert either :-) In fact, I erased all the configuration of the cluster and try to do it again from scratch with luci/ricci configuration tools (perhaps I did something wrong last time) mount -vv gives the following information: [r...@node1 gfs]# mount -t gfs -vv /dev/home2/home2 /home2 /sbin/mount.gfs: mount /dev/mapper/home2-home2 /home2 /sbin/mount.gfs: parse_opts: opts = rw /sbin/mount.gfs: clear flag 1 for rw, flags = 0 /sbin/mount.gfs: parse_opts: flags = 0 /sbin/mount.gfs: parse_opts: extra = /sbin/mount.gfs: parse_opts: hostdata = /sbin/mount.gfs: parse_opts: lockproto = /sbin/mount.gfs: parse_opts: locktable = /sbin/mount.gfs: message to gfs_controld: asking to join mountgroup: /sbin/mount.gfs: write join /home2 gfs lock_dlm gfs-test:gfs-data rw /dev/mapper/home2-home2 ... And hangs at that point (On the other node it happens the same) I tried it turning off the local firewalls on the nodes and they reached each other without problem with pings. Also, there are no more firewalls between them. The new configuration is more simple: [r...@node1 gfs]# more /etc/cluster/cluster.conf ?xml version=1.0? cluster alias=gfs-test config_version=6 name=gfs-test fence_daemon clean_start=0 post_fail_delay=0 post_join_delay=3/ clusternodes clusternode name=node1.fib.upc.es nodeid=1 votes=1 fence method name=1 device name=test nodename=node1.fib.upc.es/ /method /fence /clusternode clusternode name=node2.fib.upc.es nodeid=2 votes=1 fence method name=1 device name=test nodename=node2.fib.upc.es/ device name=test nodename=node2.fib.upc.es/ /method /fence /clusternode /clusternodes cman expected_votes=1 two_node=1/ fencedevices fencedevice agent=fence_manual name=test/ /fencedevices rm failoverdomains/ resources clusterfs device=/dev/home2/home2 force_unmount=0 fsid=3280 fstype=gfs mountpoint=/home2 name=home self_fence=0/ /resources /rm /cluster Finally, I reformatted /dev/home2/home2 with the following command that gave no errors but it doesn't affect the final result: gfs_mkfs -O -j 3 -p lock_dlm -t gfs-test:gfs-data /dev/home2/home2 Thanks, Sandra PD: I append an strace but I can't see anything useful. [r...@node1 gfs]# strace mount /home2 execve(/bin/mount, [mount, /home2], [/* 17 vars */]) = 0 brk(0) = 0x9874000 access(/etc/ld.so.preload, R_OK) = -1 ENOENT (No such file or directory) open(/etc/ld.so.cache, O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=26154, ...}) = 0 mmap2(NULL, 26154, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f51000 close(3)= 0 open(/lib/libblkid.so.1, O_RDONLY)= 3 read(3, \177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\300 O\0004\0\0\0..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=38620, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f5 mmap2(0x4f, 4, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x4f mmap2(0x4f9000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8) = 0x4f9000 close(3)= 0 open(/lib/libuuid.so.1, O_RDONLY) = 3 read(3, \177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\316V\0004\0\0\0..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=15704, ...}) = 0 mmap2(0x56c000, 12792, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x56c000 mmap2(0x56f000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3) = 0x56f000 close(3)= 0 open(/lib/libselinux.so.1, O_RDONLY) = 3 read(3, \177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\\245T\0004\0\0\0..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=93508, ...}) = 0 mmap2(0x547000, 97112, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x547000 mmap2(0x55d000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15) = 0x55d000 close(3)= 0 open(/lib/libc.so.6, O_RDONLY)= 3 read(3, \177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340_8\0004\0\0\0..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=1611564, ...}) = 0 mmap2(0x37, 1328580, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x37 mmap2(0x4af000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x13f) = 0x4af000 mmap2(0x4b2000, 9668, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS,
[CentOS] mounting gfs partition hangs
Hi, I have configured two machines for testing gfs filesystems. They are attached to a iscsi device and centos versions are: CentOS release 5.4 (Final) Linux node1.fib.upc.es 2.6.18-164.el5 #1 SMP Thu Sep 3 03:33:56 EDT 2009 i686 i686 i386 GNU/Linux The problem is if I try to mount a gfs partition it hangs. [r...@node2 ~]# cman_tool status Version: 6.2.0 Config Version: 29 Cluster Name: gfs-test Cluster Id: 25790 Cluster Member: Yes Cluster Generation: 4156 Membership state: Cluster-Member Nodes: 2 Expected votes: 2 Quorum device votes: 2 Total votes: 4 Quorum: 3 Active subsystems: 9 Flags: Ports Bound: 0 Node name: node2.fib.upc.es Node ID: 2 Multicast addresses: 239.192.100.35 Node addresses: 147.83.41.130 [r...@node2 ~]# cman_tool nodes Node Sts Inc Joined Name 0 M 0 2010-03-24 14:46:22 /dev/web/web 1 M 4156 2010-03-24 17:08:36 node1.fib.upc.es 2 M 4132 2010-03-24 14:46:09 node2.fib.upc.es [r...@node2 ~]# group_tool hangs... [r...@node1 ~]# mount -t gfs /dev/home2/home2 /home2 hangs... If I cancel the command I can return to the terminal and I don't see anything in log files. The resource /dev/home2/home2 is accessible by the two nodes and if I try to mount /home2 with lock_nolock there is no problem. cluster.conf is: ?xml version=1.0? cluster alias=gfs-test config_version=29 name=gfs-test quorumd device=/dev/web/web interval=1 min_score=1 tko=10 votes=2 heuristic interval=10 program=/bin/ping -t1 -c1 147.83.41.1 score=1/ /quorumd fence_daemon post_fail_delay=0 post_join_delay=3/ clusternodes clusternode name=node1.fib.upc.es nodeid=1 votes=1 fence method name=1 device name=gfs-test nodename=node1.fib.upc.es/ /method /fence /clusternode clusternode name=node2.fib.upc.es nodeid=2 votes=1 fence method name=1 device name=gfs-test nodename=node2.fib.upc.es/ /method /fence /clusternode /clusternodes cman/ fencedevices fencedevice agent=fence_manual name=gfs-test/ /fencedevices rm resources clusterfs device=/dev/home2/home2 force_unmount=0 fsid=1605 fstype=gfs mountpoint=/home2 name=home alumnes options=/ /resources /rm /cluster Any help will be welcomed. Thanks, Sandra ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos