On 06/06/2013 09:12 PM, Lars Ellenberg wrote:
Short term workaround:
cat /sys/block/drbd0/queue/max_hw_sectors_kb >
/sys/block/dm-9/queue/max_sectors_kb
that is: limit max_sectors_kb (which is a tunable)
to the currently apparent limits of the lower stack.
That should stop new "too big" bios from being assembled.
Yes, it does, no further such messages occured after this
change.
Then check the limits below drbd (they may have changed when you where
"messing around" during the resize procedure).
The drbd0 device on the server where the messages were emitted
sits on top of an LVM device with these limits:
/sys/block/dm-7/queue/max_hw_sectors_kb 32767
/sys/block/dm-7/queue/max_sectors_kb 512
And this LVM device currently has just one physical volume
below it with these limits:
/sys/block/sdg/queue/max_hw_sectors_kb 32767
/sys/block/sdg/queue/max_sectors_kb 512
BUT: On the server where the secondary DRBD copy resides
(and where no "too big" messages were emitted), the
limits are different:
drbd0:
/sys/block/drbd0/queue/max_hw_sectors_kb 128
/sys/block/drbd0/queue/max_sectors_kb 128
The LVM below drbd0:
/sys/block/dm-5/queue/max_hw_sectors_kb 128
/sys/block/dm-5/queue/max_sectors_kb 128
The physical device the LVM resides on:
/sys/block/sdb/queue/max_hw_sectors_kb 128
/sys/block/sdb/queue/max_sectors_kb 128
The physical device on the secondary host was (shortly
before the resize of the drbd0) moved from a controller
with max_hw_sectors_kb=32767 to a different controller
in the same machine with max_hw_sectors_kb=128
My hypothesis is now the following one:
The move of the physical device on the secondary server
caused the whole dm-stack on that server to be changed to
max_hw_sectors_kb=128, and that went all fine.
Then shortly after that, when the "drbdadmin resize"
was issued, the drbd0 on the primary was also changed
to max_hw_sectors_kb=128, but the dm-crypt atop of it
was not notified about that, and continued to issue
larger bios.
Why the subsequent "cryptsetup resize" did not cause
the dm-crypt device to notice the lowered max_hw_sectors_kb
remains unknown to me.
Another thing I still wonder about is whether the
failed bios have caused dm-crypt to re-issue smaller
writes, or whether data has gone to /dev/null, with neither
the (XFS) filesystem or any users taking note of that (which seems
somewhat unlikely, given that in total 8130 "bio too big"
error messages accumulated in the syslog).
but just do the whole drill:
umount, close crypt, down drbd, then start things up again.
Do the limits correctly stack then?
A reboot with a new kernel was scheduled for this evening,
anyway, so after that I'll be able to tell. (Trying now
would mean a very invonvenient down-time for several users.)
Regards,
Lutz Vieweg
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user