Hello,
I have a 3 node cluster and fencing agent that takes about 30 seconds to
complete the fencing. In those 30 seconds it is possible for two nodes
of the cluster to get exclusive POSIX lock on the same file.
Did I miss something here or is this correct behaviour?
Also, when trying with BSD flock, it works as I would expect - the locks
are only released after the fencing completes and node 1 is confirmed to
be fenced.
Following is output of dlm_tool dump command. Watch for the line "gfs2fs
purged 1 plocks for 1" - the locks of failed node 1 are purged long
before the fencing is completed.
Thank you for any advice.
Vladimir Martinek
217 dlm:controld conf 2 0 1 memb 2 3 join left 1
217 dlm:controld left reason nodedown 1 procdown 0 leave 0
217 set_fence_actors for 1 low 2 count 2
217 daemon remove 1 nodedown need_fencing 1 low 2
217 fence work wait for cpg ringid
217 dlm:controld ring 2:1292 2 memb 2 3
217 fence work wait for cluster ringid
217 dlm:ls:gfs2fs conf 2 0 1 memb 2 3 join left 1
217 gfs2fs add_change cg 4 remove nodeid 1 reason nodedown
217 gfs2fs add_change cg 4 counts member 2 joined 0 remove 1 failed 1
217 gfs2fs stop_kernel cg 4
217 write "0" to "/sys/kernel/dlm/gfs2fs/control"
217 gfs2fs purged 1 plocks for 1
217 gfs2fs check_ringid wait cluster 1288 cpg 1:1288
217 dlm:ls:gfs2fs ring 2:1292 2 memb 2 3
217 gfs2fs check_ringid cluster 1288 cpg 2:1292
217 fence work wait for cluster ringid
217 gfs2fs check_ringid cluster 1288 cpg 2:1292
217 cluster quorum 1 seq 1292 nodes 2
217 cluster node 1 removed seq 1292
217 del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/1"
217 fence request 1 pos 0
217 fence request 1 pid 4046 nodedown time 1446211577 fence_all dlm_stonith
217 fence wait 1 pid 4046 running
217 gfs2fs check_ringid done cluster 1292 cpg 2:1292
217 gfs2fs check_fencing 1 wait start 30 fail 217
217 gfs2fs check_fencing wait_count 1
217 gfs2fs wait for fencing
218 fence wait 1 pid 4046 running
218 gfs2fs wait for fencing
219 fence wait 1 pid 4046 running
219 gfs2fs wait for fencing
220 fence wait 1 pid 4046 running
220 gfs2fs wait for fencing
221 fence wait 1 pid 4046 running
221 gfs2fs wait for fencing
222 fence wait 1 pid 4046 running
222 gfs2fs wait for fencing
223 fence wait 1 pid 4046 running
223 gfs2fs wait for fencing
224 fence wait 1 pid 4046 running
224 gfs2fs wait for fencing
225 fence wait 1 pid 4046 running
225 gfs2fs wait for fencing
226 fence wait 1 pid 4046 running
226 gfs2fs wait for fencing
227 fence wait 1 pid 4046 running
227 gfs2fs wait for fencing
228 fence wait 1 pid 4046 running
228 gfs2fs wait for fencing
229 fence wait 1 pid 4046 running
229 gfs2fs wait for fencing
230 fence wait 1 pid 4046 running
230 gfs2fs wait for fencing
231 fence wait 1 pid 4046 running
231 gfs2fs wait for fencing
232 fence wait 1 pid 4046 running
232 gfs2fs wait for fencing
233 fence wait 1 pid 4046 running
233 gfs2fs wait for fencing
234 fence wait 1 pid 4046 running
234 gfs2fs wait for fencing
235 fence wait 1 pid 4046 running
235 gfs2fs wait for fencing
236 fence wait 1 pid 4046 running
236 gfs2fs wait for fencing
237 fence wait 1 pid 4046 running
237 gfs2fs wait for fencing
238 fence wait 1 pid 4046 running
238 gfs2fs wait for fencing
239 fence wait 1 pid 4046 running
239 gfs2fs wait for fencing
240 fence wait 1 pid 4046 running
240 gfs2fs wait for fencing
241 fence wait 1 pid 4046 running
241 gfs2fs wait for fencing
242 fence wait 1 pid 4046 running
242 gfs2fs wait for fencing
243 fence wait 1 pid 4046 running
243 gfs2fs wait for fencing
244 fence wait 1 pid 4046 running
244 gfs2fs wait for fencing
245 fence wait 1 pid 4046 running
245 gfs2fs wait for fencing
246 fence wait 1 pid 4046 running
246 gfs2fs wait for fencing
247 fence wait 1 pid 4046 running
247 gfs2fs wait for fencing
248 fence result 1 pid 4046 result 0 exit status
248 fence wait 1 pid 4046 result 0
248 gfs2fs wait for fencing
248 fence status 1 receive 0 from 2 walltime 1446211608 local 248
248 gfs2fs check_fencing 1 done start 30 fail 217 fence 248
248 gfs2fs check_fencing done
248 gfs2fs send_start 2:4 counts 2 2 0 1 1
248 gfs2fs receive_start 2:4 len 80
248 gfs2fs match_change 2:4 matches cg 4
248 gfs2fs wait_messages cg 4 need 1 of 2
248 gfs2fs receive_start 3:2 len 80
248 gfs2fs match_change 3:2 matches cg 4
248 gfs2fs wait_messages cg 4 got all 2
248 gfs2fs start_kernel cg 4 member_count 2
248 dir_member 3
248 dir_member 2
248 dir_member 1
248 set_members rmdir "/sys/kernel/config/dlm/cluster/spaces/gfs2fs/nodes/1"
248 write "1" to "/sys/kernel/dlm/gfs2fs/control"
248 gfs2fs prepare_plocks
248 gfs2fs set_plock_data_node from 1 to 2
248 gfs2fs send_plocks_done 2:4 counts 2 2 0 1 1 plocks_data 1426592608
248 gfs2fs receive_plocks_done 2:4 flags 2 plocks_data 1426592608 need 0
save 0
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster