Hi Lars

As ever, the perfect answer. Thanks for your help. We will see how we get on.

Regards,

Ben


On 2013-01-28 10:41, Lars Ellenberg wrote:
On Mon, Jan 28, 2013 at 09:31:31AM +0000, Ben Clewett wrote:
Hi guys,

We have a failure which hits us every few weeks on just one server.
We suspect hardware issue on the network card.  But it's proving
hard to tie down.   This is the failure and I would be interested in
the opinion of this group.

Error:

[1580483.649257] block drbd0: magic?? on data m: 0x0 c: 0 l: 0


Each DRBD network packet starts with a DRBD specific header.
That header contains a "magic" number, a "command" id,
and a payload "length".

All three of them are apparently zeroed out.
So yes, that pretty much looks like your network path
somehow managed to zero out at least the start of a packet.


The asserts below are "boring", and the code has since been fixed to no
longer trigger those.

[1580483.649269] block drbd0: ASSERT FAILED cstate = Connected,
expected < WFConnection
[1580483.649286] block drbd0: ASSERT( mdev->state.conn < C_CONNECTED
) in
/usr/src/packages/BUILD/drbd-8.3.4/obj/default/drbd_receiver.c:4500
[1580483.649295] block drbd0: asender terminated
[1580483.649301] block drbd0: Terminating asender thread
[1580483.649384] block drbd0: Connection closed
[1580483.649390] block drbd0: peer( Primary -> Unknown ) conn(
Connected -> Unconnected ) pdsk( UpToDate -> DUnknown )
[1580483.649396] block drbd0: receiver terminated
[1580483.649399] block drbd0: Terminating receiver thread

/proc/drbd
version: 8.3.4 (api:88/proto:86-91)

I recommend to upgrade to 8.3.15,
enable "data integrity checksumming",
run an online-verify,
and see where that gets you.

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to