Hello,
I am currently testing dual-master setup with DRBD+OCFS2.
Finally I managed to get it working well on kernel 2.6.39.4, DRBD version
8.3.10 (userland version: 8.4.1) and OCFS2 version 1.5.0.
I had some troubles with broken replication, and sometimes I see that
automatic recovery sometimes works and sometimes does not. What's strange,
is that this still are tests, and actually when one server is fully
functional, second one has no processess that even touch the synchronized
partition.
In dmesg on the active server it looks like this:
[707152.209885] block drbd0: Handshake successful: Agreed network protocol
version 96
[707152.209895] block drbd0: conn( WFConnection -> WFReportParams )
[707152.210068] block drbd0: Starting asender thread (from drbd0_receiver
[1096])
[707152.210341] block drbd0: data-integrity-alg: <not-used>
[707152.210352] block drbd0: max BIO size = 130560
[707152.210359] block drbd0: drbd_sync_handshake:
[707152.210363] block drbd0: self
8631CEC3370B5C31:A9BC3587FC5AA879:0BF8587A4ABA37B5:0BF7587A4ABA37B5 bits:21
flags:0
[707152.210368] block drbd0: peer
73E08E8E06754D97:A9BC3587FC5AA879:0BF8587A4ABA37B5:0BF7587A4ABA37B5 bits:0
flags:0
[707152.210371] block drbd0: uuid_compare()=100 by rule 90
[707152.210377] block drbd0: helper command: /sbin/drbdadm initial-split-brain
minor-0
[707152.212439] block drbd0: helper command: /sbin/drbdadm initial-split-brain
minor-0 exit code 0 (0x0)
[707152.212442] block drbd0: Split-Brain detected but unresolved, dropping
connection!
[707152.212445] block drbd0: helper command: /sbin/drbdadm split-brain minor-0
[707152.214134] block drbd0: helper command: /sbin/drbdadm split-brain minor-0
exit code 0 (0x0)
[707152.214137] block drbd0: conn( WFReportParams -> Disconnecting )
[707152.214141] block drbd0: error receiving ReportState, l: 4!
[707152.214150] block drbd0: asender terminated
[707152.214154] block drbd0: Terminating drbd0_asender
[707152.214177] block drbd0: Connection closed
[707152.214180] block drbd0: conn( Disconnecting -> StandAlone )
[707152.214188] block drbd0: receiver terminated
[707152.214190] block drbd0: Terminating drbd0_receiver
Is there any help for this situation? I don't understand why the case
isn't solved, since second server doesn't write to drbd0, sometimes even
partition wasn't mounted (I can't be 100% sure, but it seems so).
I would be greatful if you could give me some hint how to make this
configuration stable, without sacrificing data on one of nodes (now in
order to recover I have to set second node to slave). Any ideas what is
wrong in my setup?
P.S. Any suggestions how to measure real performance (read/write/copy) of
DRBD+OCFS2? UnixBench gives crazy results (read performance about 10% of
local filesystem)...
Best regards,
--
Jacek Osiecki
[email protected]
Silvercube s.c.
ul. Makuszynskiego 4
31-752 Kraków
+48 (12) 684 21 00
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user