On 06/06/2013 02:51 PM, Lars Ellenberg wrote:
You did something bad, and that confused the IO stack.
I would have expected any kind of error message from any of the
tools I used to increase the device sized if I actually
did something bad...
This causes IO errors.
Interestingly, while these "kernel: bio too big device drbd0"
keep coming, no human user or other component of the machine complains
about any error... so far for ~ one week of intensive usage.
On 06/06/2013 03:39 PM, Sebastian Riemer wrote:
Looks like something in the IO stack above DRBD in the kernel doesn't
respect the IO size limits of DRBD.
In kernel 3.3 the function "blk_set_stacking_limits()" has been
introduced to fix such issues. MD uses this function for example. Before
that MD used too small IO limits.
Try these commands and repeat them for the devices above:
$ cat /sys/block/drbd0/queue/max_sectors_kb
$ cat /sys/block/drbd0/queue/max_hw_sectors_kb
The fascinating results:
# for i in /sys/block/drbd*/queue/max_sectors_kb ; do echo -n "$i " ; cat $i ;
done
/sys/block/drbd0/queue/max_sectors_kb 128
/sys/block/drbd1/queue/max_sectors_kb 512
/sys/block/drbd7/queue/max_sectors_kb 512
# for i in /sys/block/drbd*/queue/max_hw_sectors_kb ; do echo -n "$i " ; cat $i
; done
/sys/block/drbd0/queue/max_hw_sectors_kb 128
/sys/block/drbd1/queue/max_hw_sectors_kb 1024
/sys/block/drbd7/queue/max_hw_sectors_kb 1024
Should be 128 as DRBD has 128 KiB hashing functions and can't do bigger
IO because of that. The kernel internally calculates with 512 byte
sectors. So 256 sectors are 128 KiB.
I wonder why only drbd0, which is one of three drbd devices used
on the machine, shows such a result - and drbd0 is the only device
that the "bio too big" messages are reported for.
Have a look into the kernel source in "block/blk-core.c" and search for
"bio too big device" for details. In the function
"generic_make_request_checks()" you can see that an IO error is sent to
the upper layers in that case ( bio_endio(bio, -EIO) ).
Yes, so the next layer, which is dm-crypt, should either complain / return
an error, too, or do some magic to slice the write into pieces, right?
BTW: This is what I get for the dm-crypt device that sits on top of drbd0:
/sys/block/dm-9/queue/max_hw_sectors_kb 1024
/sys/block/dm-9/queue/max_sectors_kb 512
Should I be worried?
It depends on how the layers above react on this situation. If they try
again with smaller IOs, then it's okay. Otherwise, there can be a major
issue. Kernel code has to be read to verify.
I could not find the right place to look at in drivers/md/dm-crypt.c,
do you have a suggestion?
Regards,
Lutz Vieweg
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user