Re: [Ocfs2-users] Node 8 doesn't mount / Wrong slot map assignment?

2017-09-14 Thread Michael Ulbrich
Hi again,

I made some progress with debugging the situation.

To recap:

2 ocfs2 file systems:

/dev/drbd0 -> lvm -> RAID1 from 2 x 600 GB SAS disks

/dev/drbd1 -> lvm -> RAID1 from 2 x 6 TB NL (Near-Line) SAS disks

This is configured identically on 2 DELL R 530 servers (node 1 + 2 as
hypervisors). Disks are connected via PERC H730 mini (Linux kernel
driver: megaraid_sas ver. 06.811.02.00-rc1). drbd has a private GigE
link for replication traffic. Both hypervisors run 3 virtual machines each.

/dev/drbd0 works as expected as long as it's allocated on the 600 GB
RAID 1. If it's moved to the large 6 TB RAID1 device the behaviuor gets
identical to /dev/drbd1.

As described in my previous post there's an unusual slot (?) numbering
which prevents the mount of the ocfs2 file system /dev/drbd1 on node 8.
As a quick fix we could swap node numbers 1 <-> 8 in cluster.conf. But
this does not address the underlying problem as we will soon see. In
deliberately formatted form the list of nodes looks as follows:

node (number = 8, name = h1a) -  Hypervisor
node (number = 2, name = h1b) -  Hypervisor
node (number = 3, name = web1) - Guest 1 on h1a
node (number = 4, name = db1)  - Guest 2 on h1a
node (number = 5, name = srv1) - Guest 3 on h1a
node (number = 6, name = web2) - Guest 4 on h1b
node (number = 7, name = db2)  - Guest 5 on h1b
node (number = 1, name = srv2) - Guest 6 on h1b

Now node 8 is the first (Hypervisor) node to mount /dev/drbd1 which
leads to ('watch -d -n 1 "echo \"hb\" | debugfs.ocfs2 -n /dev/drbd1"):

hb

node: node  seq   generation checksum
  64:8 59b8d9ba 73a63eb550a33095 f4e074d1

Node 2 is the second (Hypervisor) node to mount:

hb
node: node  seq   generation checksum
  16:2 59b8d9b9 5c7504c05637983e 07d696ec
  64:8 59b8d9ba 73a63eb550a33095 f4e074d1

Again we see the strange "* 8" or "shift left 3" relationship between columns 
"node:" and "node".

Now the guests are brought up and mount the file system in order 3, 5, 6, 1 (I 
don't have the actual seq / gen values, so from memory):

hb
node: node  seq   generation checksum
   1:1   
   3:3   
   5:5   
   6:6   
  16:2 59b8d9b9 5c7504c05637983e 07d696ec
  64:8 59b8d9ba 73a63eb550a33095 f4e074d1

Please note that the virtual machines get assigned the corresponding "node:" = 
"node" values as expected.

Now we went a step further and enabled tracing: "debugfs.ocfs2 -l HEARTBEAT 
allow". This periodically logs messages from the heartbeat threads of the 
individual file systems. For the file system /dev/drbd1 we get on the 
hypervisors:

(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 1 gen 0x0 cksum 0x0 seq 0 
last 0 changed 0 equal 1544
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 2 gen 0x98be08e71122efed 
cksum 0x33a84ac0 seq 1505346907 last 1505346907 changed 1 equal 0
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 3 gen 0x0 cksum 0x0 seq 0 
last 0 changed 0 equal 1544
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 4 gen 0x0 cksum 0x0 seq 0 
last 0 changed 0 equal 1544
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 5 gen 0x0 cksum 0x0 seq 0 
last 0 changed 0 equal 1544
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 6 gen 0x0 cksum 0x0 seq 0 
last 0 changed 0 equal 1544
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 7 gen 0x0 cksum 0x0 seq 0 
last 0 changed 0 equal 1544
(o2hb-3B0327532D,32784,3):o2hb_check_slot:849 Slot 8 gen 0x551934cc4ba0b1bf 
cksum 0xf606e2be seq 1505346907 last 1505346907 changed 1 equal 0

We only see the hypervisors heartbeating in slots 2 and 8 although 4 additional 
guests have also mounted the same file system.

Tracing the ocfs2 heartbeat on one of the guests (web1) gives the following:

(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 1 gen 0xd1f96dee2509bc73 cksum 
0x1dc10931 seq 1505371587 last 1505371587 changed 1 equal 0
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 2 gen 0x0 cksum 0x0 seq 0 last 
0 changed 0 equal 13674
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 3 gen 0x5d8c200c0113510f cksum 
0xbfc95a14 seq 1505371590 last 1505371590 changed 1 equal 0
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 4 gen 0x0 cksum 0x0 seq 0 last 
0 changed 0 equal 13674
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 5 gen 0x39a8da3bae49161b cksum 
0x49b4a110 seq 1505371588 last 1505371588 changed 1 equal 0
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 6 gen 0xc00a0ba3931ad15 cksum 
0x92625e99 seq 1505371587 last 1505371587 changed 1 equal 0
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 7 gen 0x0 cksum 0x0 seq 0 last 
0 changed 0 equal 13674
(o2hb-3B0327532D,514,0):o2hb_check_slot:849 Slot 8 gen 0x0 cksum 0x0

[Ocfs2-users] Node 8 doesn't mount / Wrong slot map assignment?

2017-09-13 Thread Michael Ulbrich
Hi all,

we've a small (?) problem with a 2-node cluster on Debian 8:

Linux h1b 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u2 (2017-06-26)
x86_64 GNU/Linux

ocfs2-tools 1.6.4-3

Two ocfs2 filesystems (drbd0 600 GB w/ 8 slots and drbd1 6 TB w/ 6
slots) are created on top of drbd w/ 4k block and cluster size,
'max_features' enabled.

cluster.conf assigns sequential node numbers 1 - 8. Nodes 1, 2 are the
hypervisors. Nodes 3, 4, 5 are VMs on node 1. Nodes 6, 7, 8 the
corresponding VMs on node 2.

VMs all run Debian 8 as well:

Linux srv2 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1 (2016-12-30) x86_64
GNU/Linux

When mounting drbd0 in order of increasing node numbers and concurrently
watching the 'hb' output from debugsfs.ocfs2 we get a clean slot map (?):

hb
node: node  seq   generation checksum
   1:1 59b8d94a fa60f0d8423590d9 edec9643
   2:2 59b8d94c aca059df4670f467 994e3458
   3:3 59b8d949 f03dc9ba8f27582c d4473fc2
   4:4 59b8d94b df5bbdb756e757f8 12a198eb
   5:5 59b8d94a 1af81d94a7cb681b 91fba906
   6:6 59b8d94b 104538f30cdb35fa 8713e798
   7:7 59b8d94b 195658c9fb8ca7f9 5e54edf6
   8:8 59b8d949 dc6bfb46b9cf1ac3 de7a8757

Device drbd1 in contrast yields the following table after mounting on
nodes 1, 2:

hb
node: node  seq   generation checksum
   8:1 59b8d9ba 73a63eb550a33095 f4e074d1
  16:2 59b8d9b9 5c7504c05637983e 07d696ec

Proceeding with the drbd1 mounts on nodes 3, 5, 6 leads us to:

hb
node: node  seq   generation checksum
   3:3 59b8da3b 9443b4b209b16175 f2cc87ec
   5:5 59b8da3c 4b742f709377466f 3ac41cf3
   6:6 59b8da3b d96e2de0a55514f6 335a4d90
   8:1 59b8da3c 73a63eb550a33095 2312c1c4
  16:2 59b8da3d 5c7504c05637983e 659571a1

The problem arises when trying to mount node 8 since its slot is already
occupied by node 1:

kern.log node 1:

(o2hb-0AEE381A14,50990,4):o2hb_check_own_slot:582 ERROR: Another node is
heartbeating on device (drbd1): expected(1:0x18acf7b0b3e5544c,
0x59b8445c), ondisk(8:0xb91302db72a65364, 0x59b8445b)

kern.log node 8:

ocfs2: Mounting device (254,16) on (node 8, slot 7) with ordered data mode.
(o2hb-0AEE381A14,518,1):o2hb_check_own_slot:582 ERROR: Another node is
heartbeating on device (vdc): expected(8:0x18acf7b0b3e5544c,
0x59b8445c), ondisk(1:0x18acf7b0b3e5544c, 0x59b8445c)

This can be "fixed" by exchanging node numbers 1 and 8 in cluster.conf.
Then node 8 will be assigned slot 8, node 2 stays in slot 16, 3 to 7 as
expected. There is no node 16 configured so there's no conflict. But
since we experience some other so far not explainable instabilities with
this ocfs2 device / system during operation further down the road we
decided to take care of and try to fix this issue first.

Somehow the failure reminds of bit shift or masking problems:

1 << 3 = 8
2 << 3 = 16

But then again - what do I know ...

Tried so far:

A. Create offending file system with 8 slots instead of 6 -> same issue.
B. Set features to 'default' (disables feature 'extended-slotmap') ->
same issue.

We'd very much appreciate any comments on this. Has anything similar
ever been experienced before? Are we completely missing something
important here?

If there's a fix already out for this any pointers (src files / commits)
to where to look would be greatly appreciated.

Thanks in advance + Best regards ... Michael U.

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users