Hi all, we've a small (?) problem with a 2-node cluster on Debian 8:
Linux h1b 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u2 (2017-06-26) x86_64 GNU/Linux ocfs2-tools 1.6.4-3 Two ocfs2 filesystems (drbd0 600 GB w/ 8 slots and drbd1 6 TB w/ 6 slots) are created on top of drbd w/ 4k block and cluster size, 'max_features' enabled. cluster.conf assigns sequential node numbers 1 - 8. Nodes 1, 2 are the hypervisors. Nodes 3, 4, 5 are VMs on node 1. Nodes 6, 7, 8 the corresponding VMs on node 2. VMs all run Debian 8 as well: Linux srv2 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1 (2016-12-30) x86_64 GNU/Linux When mounting drbd0 in order of increasing node numbers and concurrently watching the 'hb' output from debugsfs.ocfs2 we get a clean slot map (?): hb node: node seq generation checksum 1: 1 0000000059b8d94a fa60f0d8423590d9 edec9643 2: 2 0000000059b8d94c aca059df4670f467 994e3458 3: 3 0000000059b8d949 f03dc9ba8f27582c d4473fc2 4: 4 0000000059b8d94b df5bbdb756e757f8 12a198eb 5: 5 0000000059b8d94a 1af81d94a7cb681b 91fba906 6: 6 0000000059b8d94b 104538f30cdb35fa 8713e798 7: 7 0000000059b8d94b 195658c9fb8ca7f9 5e54edf6 8: 8 0000000059b8d949 dc6bfb46b9cf1ac3 de7a8757 Device drbd1 in contrast yields the following table after mounting on nodes 1, 2: hb node: node seq generation checksum 8: 1 0000000059b8d9ba 73a63eb550a33095 f4e074d1 16: 2 0000000059b8d9b9 5c7504c05637983e 07d696ec Proceeding with the drbd1 mounts on nodes 3, 5, 6 leads us to: hb node: node seq generation checksum 3: 3 0000000059b8da3b 9443b4b209b16175 f2cc87ec 5: 5 0000000059b8da3c 4b742f709377466f 3ac41cf3 6: 6 0000000059b8da3b d96e2de0a55514f6 335a4d90 8: 1 0000000059b8da3c 73a63eb550a33095 2312c1c4 16: 2 0000000059b8da3d 5c7504c05637983e 659571a1 The problem arises when trying to mount node 8 since its slot is already occupied by node 1: kern.log node 1: (o2hb-0AEE381A14,50990,4):o2hb_check_own_slot:582 ERROR: Another node is heartbeating on device (drbd1): expected(1:0x18acf7b0b3e5544c, 0x59b8445c), ondisk(8:0xb91302db72a65364, 0x59b8445b) kern.log node 8: ocfs2: Mounting device (254,16) on (node 8, slot 7) with ordered data mode. (o2hb-0AEE381A14,518,1):o2hb_check_own_slot:582 ERROR: Another node is heartbeating on device (vdc): expected(8:0x18acf7b0b3e5544c, 0x59b8445c), ondisk(1:0x18acf7b0b3e5544c, 0x59b8445c) This can be "fixed" by exchanging node numbers 1 and 8 in cluster.conf. Then node 8 will be assigned slot 8, node 2 stays in slot 16, 3 to 7 as expected. There is no node 16 configured so there's no conflict. But since we experience some other so far not explainable instabilities with this ocfs2 device / system during operation further down the road we decided to take care of and try to fix this issue first. Somehow the failure reminds of bit shift or masking problems: 1 << 3 = 8 2 << 3 = 16 But then again - what do I know ... Tried so far: A. Create offending file system with 8 slots instead of 6 -> same issue. B. Set features to 'default' (disables feature 'extended-slotmap') -> same issue. We'd very much appreciate any comments on this. Has anything similar ever been experienced before? Are we completely missing something important here? If there's a fix already out for this any pointers (src files / commits) to where to look would be greatly appreciated. Thanks in advance + Best regards ... Michael U. _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users