Hi again,
I made some progress with debugging the situation.
To recap:
2 ocfs2 file systems:
/dev/drbd0 -> lvm -> RAID1 from 2 x 600 GB SAS disks
/dev/drbd1 -> lvm -> RAID1 from 2 x 6 TB NL (Near-Line) SAS disks
This is configured identically on 2 DELL R 530 servers (node 1 + 2 as
hypervisors). Disks are connected via PERC H730 mini (Linux kernel
driver: megaraid_sas ver. 06.811.02.00-rc1). drbd has a private GigE
link for replication traffic. Both hypervisors run 3 virtual machines each.
/dev/drbd0 works as expected as long as it's allocated on the 600 GB
RAID 1. If it's moved to the large 6 TB RAID1 device the behaviuor gets
identical to /dev/drbd1.
As described in my previous post there's an unusual slot (?) numbering
which prevents the mount of the ocfs2 file system /dev/drbd1 on node 8.
As a quick fix we could swap node numbers 1 <-> 8 in cluster.conf. But
this does not address the underlying problem as we will soon see. In
deliberately formatted form the list of nodes looks as follows:
node (number = 8, name = h1a) - Hypervisor
node (number = 2, name = h1b) - Hypervisor
node (number = 3, name = web1) - Guest 1 on h1a
node (number = 4, name = db1) - Guest 2 on h1a
node (number = 5, name = srv1) - Guest 3 on h1a
node (number = 6, name = web2) - Guest 4 on h1b
node (number = 7, name = db2) - Guest 5 on h1b
node (number = 1, name = srv2) - Guest 6 on h1b
Now node 8 is the first (Hypervisor) node to mount /dev/drbd1 which
leads to ('watch -d -n 1 "echo \"hb\" | debugfs.ocfs2 -n /dev/drbd1"):
hb
node: node seq generation checksum
64:8 59b8d9ba 73a63eb550a33095 f4e074d1
Node 2 is the second (Hypervisor) node to mount:
hb
node: node seq generation checksum
16:2 59b8d9b9 5c7504c05637983e 07d696ec
64:8 59b8d9ba 73a63eb550a33095 f4e074d1
Again we see the strange "* 8" or "shift left 3" relationship between columns
"node:" and "node".
Now the guests are brought up and mount the file system in order 3, 5, 6, 1 (I
don't have the actual seq / gen values, so from memory):
hb
node: node seq generation checksum
1:1
3:3
5:5
6:6
16:2 59b8d9b9 5c7504c05637983e 07d696ec
64:8 59b8d9ba 73a63eb550a33095 f4e074d1
Please note that the virtual machines get assigned the corresponding "node:" =
"node" values as expected.
Now we went a step further and enabled tracing: "debugfs.ocfs2 -l HEARTBEAT
allow". This periodically logs messages from the heartbeat threads of the
individual file systems. For the file system /dev/drbd1 we get on the
hypervisors:
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 1 gen 0x0 cksum 0x0 seq 0
last 0 changed 0 equal 1544
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 2 gen 0x98be08e71122efed
cksum 0x33a84ac0 seq 1505346907 last 1505346907 changed 1 equal 0
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 3 gen 0x0 cksum 0x0 seq 0
last 0 changed 0 equal 1544
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 4 gen 0x0 cksum 0x0 seq 0
last 0 changed 0 equal 1544
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 5 gen 0x0 cksum 0x0 seq 0
last 0 changed 0 equal 1544
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 6 gen 0x0 cksum 0x0 seq 0
last 0 changed 0 equal 1544
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 7 gen 0x0 cksum 0x0 seq 0
last 0 changed 0 equal 1544
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 8 gen 0x551934cc4ba0b1bf
cksum 0xf606e2be seq 1505346907 last 1505346907 changed 1 equal 0
We only see the hypervisors heartbeating in slots 2 and 8 although 4 additional
guests have also mounted the same file system.
Tracing the ocfs2 heartbeat on one of the guests (web1) gives the following:
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 1 gen 0xd1f96dee2509bc73 cksum
0x1dc10931 seq 1505371587 last 1505371587 changed 1 equal 0
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 2 gen 0x0 cksum 0x0 seq 0 last
0 changed 0 equal 13674
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 3 gen 0x5d8c200c0113510f cksum
0xbfc95a14 seq 1505371590 last 1505371590 changed 1 equal 0
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 4 gen 0x0 cksum 0x0 seq 0 last
0 changed 0 equal 13674
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 5 gen 0x39a8da3bae49161b cksum
0x49b4a110 seq 1505371588 last 1505371588 changed 1 equal 0
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 6 gen 0xc00a0ba3931ad15 cksum
0x92625e99 seq 1505371587 last 1505371587 changed 1 equal 0
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 7 gen 0x0 cksum 0x0 seq 0 last
0 changed 0 equal 13674
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 8 gen 0x0 cksum 0x0