09.04.2018 22:30, Kees Cook wrote:
echo 1 | tee /sys/block/sd*/queue/nr_requests

I can't get this below "4".

Oops, yeah. It cannot be less than BLKDEV_MIN_RQ (which is 4), so it is enforced explicitly in queue_requests_store(). It is the same for me.

echo 1 | tee /sys/block/sd*/device/queue_depth

I've got this now too.
Ah! dm-crypt too. I'll see if I can get that added easily to my tests.
And XFS! You love your corner cases. ;)

Yeah, so far this wonderful configuration has allowed me to uncover a bunch of bugs, and see, we are not done yet ;).

Two other questions, since you can reproduce this easily:
- does it reproduce _without_ hardened usercopy? (I would assume yes,
but you'd just not get any warning until the hangs started.) If it
does reproduce without hardened usercopy, then a new bisect run could
narrow the search even more.

Looks like it cannot be disabled via kernel cmdline, so I have to re-compile the kernel, right? I can certainly do that anyway.

- does it reproduce with Linus's current tree?

Will try this too.

What would imply missing locking, yes? Yikes. But I'd expect
use-after-free or something, or bad data, not having the pointer slip

I still think this has something to do with blk-mq re-queuing capability and how BFQ implements it, because there are no sings of issue popping up with Kyber so far.

Quick update: I added dm-crypt (with XFS on top) and it hung my system
almost immediately. I got no warnings at all, though.

Did your system hang on smartctl hammering too? Have you got some stack traces to compare with mine ones?


