09.04.2018 22:30, Kees Cook wrote:
echo 1 | tee /sys/block/sd*/queue/nr_requests
I can't get this below "4".
Oops, yeah. It cannot be less than BLKDEV_MIN_RQ (which is 4), so it is
enforced explicitly in queue_requests_store(). It is the same for me.
echo 1 | tee /sys/block/sd*/device/queue_depth
I've got this now too.
Ah! dm-crypt too. I'll see if I can get that added easily to my tests.
And XFS! You love your corner cases. ;)
Yeah, so far this wonderful configuration has allowed me to uncover a
bunch of bugs, and see, we are not done yet ;).
Two other questions, since you can reproduce this easily:
- does it reproduce _without_ hardened usercopy? (I would assume yes,
but you'd just not get any warning until the hangs started.) If it
does reproduce without hardened usercopy, then a new bisect run could
narrow the search even more.
Looks like it cannot be disabled via kernel cmdline, so I have to
re-compile the kernel, right? I can certainly do that anyway.
- does it reproduce with Linus's current tree?
Will try this too.
What would imply missing locking, yes? Yikes. But I'd expect
use-after-free or something, or bad data, not having the pointer slip
I still think this has something to do with blk-mq re-queuing capability
and how BFQ implements it, because there are no sings of issue popping
up with Kyber so far.
Quick update: I added dm-crypt (with XFS on top) and it hung my system
almost immediately. I got no warnings at all, though.
Did your system hang on smartctl hammering too? Have you got some stack
traces to compare with mine ones?