Re: [DRBD-user] frequent wrong magic value with kernel >4.9 caused by big mtu

Lars Ellenberg Mon, 12 Feb 2018 09:13:31 -0800

On Mon, Feb 12, 2018 at 05:17:24PM +0100, Andreas Pflug wrote:
> > After the tcpdump analysis showed that the problem must be located below
> > DRBD, I played around with eth settings. Cutting down the former MTU of
> > 9710 to default 1500 did fix the problem, as well as disabling
> > scatter-gather. So apparently big MTU and scatter-gather don't play
> > nicely on later kernels (or the updated nic driver)


My findings with that provided tcpdump where:

| tcpdump sees corrupted frames as well.
| 
| There is a DRBD data packet,
| expecting a full 4096 byte write to sector 958552.
| Then expects the next header.
| But that frame looks corrupted
| ...
| 
| supposedly 4096 byte, all looking roughly like this,
| probably some file system meta data, or raw rrd file,
| or other data block containing an array of related 64bit values:
| 
| ... boring binary data
| 
| but at byte offset 3856, that pattern suddenly changes to something 
completely unrelated,
| so likely a wrong page got mapped/linked somewhere:
| 
| ... more boring data, but suddenly pattern changed to plain text / xml
| 
| * HERE * is the supposed end of the 4096 byte data block,
| and the supposed start of the next DRBD header (with the "magic").
| 
| Obviously there is no DRBD header magic here, though.
| 
| ... more boring data with that same plain text / xml

> > I posted a kernel bug on this,
> > https://bugzilla.kernel.org/show_bug.cgi?id=198723
> 
> Unfortunately, scatter-gather seems NOT to be the culprit. Changing all
> other receive and generic offload settings didn't help either, so only
> big mtu remains.

You will likely get more satisfying responses to that bugzilla,
if you find a reproducer for the data corruption that does not involve
"unusual" things like DRBD, but can be reproduced by "anyone",
using just "dd and netcat" maybe.

I suggested already in that private mail:

For fault isolation tests,
you could try with different offloading settings,
you could try with different mtu settings,
you could try to use "dd | netcat"
(not scp; it has encryption and thus strong checksums,
I'd expect it to transparently catch this kind of crap and resend)
for some data transfers,
and double check if you get corruption somewhere as well.

You could enable the "data integrity" option in DRBD
and see if that catches the corruption even earlier
(when receiving the corrupt data block,
not only when trying to parse the next header)

>From what you have found so far, my best guess is that the newer kernel somehow
breaks your receive offloading (maybe only for "jumbo frames").

I strongly doubt that DRBD can be blamed here,
or can do anything about this.

The expectation is that if your TCP streams corrupt stuff for DRBD,
the will corrupt stuff for other connections as well.

Some "PoC" shell snippets to transfer and integrity check data:
---------------------------------------------------------------

You could e.g. create on huge tarball of your entire /usr,
calculate the md5sum (sha1sum, sha512sum, I don't care),
and transfer that safely.

# tar cf usr.tar /usr --one-file-system
# sha512sum -b usr.tar > usr.tar.checksum
# scp usr.tar.checksum $peer:

Then loop until you hit a mismatch:

on the peer:
# while nc -l -p 9999 > usr.tar; do
#       sha512sum -c usr.tar.checksum || break;
# done

on this node:
# mv usr.tar usr.tar.orig
# while <usr.tar.orig | tee usr.tar | nc -q1 $peer 9999; do
#       sha512sum -c usr.tar.checksum || break;
#       sleep 5;
# done

Add progress messages or | pv | progress meters,
or other sanity checks (transfer complete?) if you want.

Cheers,

        Lars

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] frequent wrong magic value with kernel >4.9 caused by big mtu

Reply via email to