Running a heavy I/O load on multipath/dual-ported SSD disks attached to a SAS3008 adapter (mpt3sas driver), we are seeing I/Os get aborted and tasks stuck in blk_complete_request() and this sometimes results in hitting a BUG_ON in blk_start_request(). It would appear that we are seeing two completions performed on an I/O, and the second completion is racing with re-use of the request for a new I/O.

I saw this upstream commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.17-rc3&id=9961c9bbf2b43acaaf030a0fbabc9954d937ad8c

which addresses the case where the normal completion occurs before the abort completion. But the situation I am seeing appears to be that the abort completion occurs before the normal completion (due to tasks getting delayed in blk_complete_request()). I don't find any commit to fix this second case.

Of course, tasks being delayed like this is a concern, and is being worked separately. But it seems that the alternate double-completion case is being ignored here.

Does everyone concur that this second case needs to be addressed? Is there a proposed fix?

Thanks,

Doug

FYI, system is a Power9 running RHEL-ALT 7.5, two SAS3008 adapters connected to an IBM EXP24SX SAS Storage Enclosure with 24 HUSMM8040ASS201 drives. FIO was being used to drive the I/O load.


Reply via email to