your fencing is working ? because i see this from your dlm lockspace "new status wait_messages 0 wait_condition 1 fencing".
2014-04-07 9:26 GMT+02:00 Bjoern Teipel <bjoern.tei...@internetbrands.com>: > H all, > > i did a dlm_tool leave clvmd on one node (node06) of a CMAN cluster with > CLVMD > Now I have the problem that clvmd is stuck and all nodes lost > connections to DLM. > For some reason dlm want's to fence member 8 I guess and that might > stuck the whole dlm? > All other stacks, cman, corosync look fine... > > Thanks, > Bjoern > > Error: > > dlm: closing connection to node 2 > dlm: closing connection to node 3 > dlm: closing connection to node 4 > dlm: closing connection to node 5 > dlm: closing connection to node 6 > dlm: closing connection to node 8 > dlm: closing connection to node 9 > dlm: closing connection to node 10 > dlm: closing connection to node 2 > dlm: closing connection to node 3 > dlm: closing connection to node 4 > dlm: closing connection to node 5 > dlm: closing connection to node 6 > dlm: closing connection to node 8 > dlm: closing connection to node 9 > dlm: closing connection to node 10 > INFO: task dlm_tool:33699 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > dlm_tool D 0000000000000003 0 33699 33698 0x00000080 > ffff88138905dcc0 0000000000000082 ffffffff81168043 ffff88138905dd18 > ffff88138905dd08 ffff88305b30ccc0 ffff88304fa5c800 ffff883058e49900 > ffff881857329058 ffff88138905dfd8 000000000000fb88 ffff881857329058 > Call Trace: > [<ffffffff81168043>] ? kmem_cache_alloc_trace+0x1a3/0x1b0 > [<ffffffff8132f79a>] ? misc_open+0x1ca/0x320 > [<ffffffff81510725>] rwsem_down_failed_common+0x95/0x1d0 > [<ffffffff81185505>] ? chrdev_open+0x125/0x230 > [<ffffffff815108b6>] rwsem_down_read_failed+0x26/0x30 > [<ffffffff8117e5ff>] ? __dentry_open+0x23f/0x360 > [<ffffffff81283894>] call_rwsem_down_read_failed+0x14/0x30 > [<ffffffff8150fdb4>] ? down_read+0x24/0x30 > [<ffffffffa06d948d>] dlm_clear_proc_locks+0x3d/0x2a0 [dlm] > [<ffffffff811dfed6>] ? generic_acl_chmod+0x46/0xd0 > [<ffffffffa06e4b36>] device_close+0x66/0xc0 [dlm] > [<ffffffff81182b45>] __fput+0xf5/0x210 > [<ffffffff81182c85>] fput+0x25/0x30 > [<ffffffff8117e0dd>] filp_close+0x5d/0x90 > [<ffffffff8117e1b5>] sys_close+0xa5/0x100 > [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b > > > > Status: > > cman_tool nodes > Node Sts Inc Joined Name > 1 M 18908 2014-03-24 19:01:00 node01 > 2 M 18972 2014-04-06 22:47:57 node02 > 3 M 18972 2014-04-06 22:47:57 node03 > 4 M 18972 2014-04-06 22:47:57 node04 > 5 M 18972 2014-04-06 22:47:57 node05 > 6 X 18960 node06 > 7 X 18928 node07 > 8 M 18972 2014-04-06 22:47:57 node08 > 9 M 18972 2014-04-06 22:47:57 node09 > 10 M 18972 2014-04-06 22:47:57 node10 > > dlm lockspaces > name clvmd > id 0x4104eefa > flags 0x00000004 kern_stop > change member 8 joined 0 remove 1 failed 0 seq 11,11 > members 1 2 3 4 5 8 9 10 > new change member 8 joined 1 remove 0 failed 0 seq 12,41 > new status wait_messages 0 wait_condition 1 fencing > new members 1 2 3 4 5 8 9 10 > > > > DLM dump: > 1396849677 cluster node 2 added seq 18972 > 1396849677 set_configfs_node 2 10.14.18.66 local 0 > 1396849677 cluster node 3 added seq 18972 > 1396849677 set_configfs_node 3 10.14.18.67 local 0 > 1396849677 cluster node 4 added seq 18972 > 1396849677 set_configfs_node 4 10.14.18.68 local 0 > 1396849677 cluster node 5 added seq 18972 > 1396849677 set_configfs_node 5 10.14.18.70 local 0 > 1396849677 cluster node 8 added seq 18972 > 1396849677 set_configfs_node 8 10.14.18.80 local 0 > 1396849677 cluster node 9 added seq 18972 > 1396849677 set_configfs_node 9 10.14.18.81 local 0 > 1396849677 cluster node 10 added seq 18972 > 1396849677 set_configfs_node 10 10.14.18.77 local 0 > 1396849677 dlm:ls:clvmd conf 2 1 0 memb 1 3 join 3 left > 1396849677 clvmd add_change cg 35 joined nodeid 3 > 1396849677 clvmd add_change cg 35 counts member 2 joined 1 remove 0 failed > 0 > 1396849677 dlm:ls:clvmd conf 3 1 0 memb 1 2 3 join 2 left > 1396849677 clvmd add_change cg 36 joined nodeid 2 > 1396849677 clvmd add_change cg 36 counts member 3 joined 1 remove 0 failed > 0 > 1396849677 dlm:ls:clvmd conf 4 1 0 memb 1 2 3 9 join 9 left > 1396849677 clvmd add_change cg 37 joined nodeid 9 > 1396849677 clvmd add_change cg 37 counts member 4 joined 1 remove 0 failed > 0 > 1396849677 dlm:ls:clvmd conf 5 1 0 memb 1 2 3 8 9 join 8 left > 1396849677 clvmd add_change cg 38 joined nodeid 8 > 1396849677 clvmd add_change cg 38 counts member 5 joined 1 remove 0 failed > 0 > 1396849677 dlm:ls:clvmd conf 6 1 0 memb 1 2 3 8 9 10 join 10 left > 1396849677 clvmd add_change cg 39 joined nodeid 10 > 1396849677 clvmd add_change cg 39 counts member 6 joined 1 remove 0 failed > 0 > 1396849677 dlm:ls:clvmd conf 7 1 0 memb 1 2 3 5 8 9 10 join 5 left > 1396849677 clvmd add_change cg 40 joined nodeid 5 > 1396849677 clvmd add_change cg 40 counts member 7 joined 1 remove 0 failed > 0 > 1396849677 dlm:ls:clvmd conf 8 1 0 memb 1 2 3 4 5 8 9 10 join 4 left > 1396849677 clvmd add_change cg 41 joined nodeid 4 > 1396849677 clvmd add_change cg 41 counts member 8 joined 1 remove 0 failed > 0 > 1396849677 dlm:controld conf 2 1 0 memb 1 3 join 3 left > 1396849677 dlm:controld conf 3 1 0 memb 1 2 3 join 2 left > 1396849677 dlm:controld conf 4 1 0 memb 1 2 3 9 join 9 left > 1396849677 dlm:controld conf 5 1 0 memb 1 2 3 8 9 join 8 left > 1396849677 dlm:controld conf 6 1 0 memb 1 2 3 8 9 10 join 10 left > 1396849677 dlm:controld conf 7 1 0 memb 1 2 3 5 8 9 10 join 5 left > 1396849677 dlm:controld conf 8 1 0 memb 1 2 3 4 5 8 9 10 join 4 left > > -- > Linux-cluster mailing list > Linux-cluster@redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera
-- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster