Hello Bart
Have not seen this more than twice but during testing of latest
upstream kernel with SRP I have had two of these completion races.
4.17.0+
[49945.984133] sd 2:0:0:29: alua: transition timeout set to 60 seconds
[49945.984136] sd 2:0:0:29: alua: port group 00 state A non-preferred
supports TOlUSNA
[49946.023273] sd 2:0:0:6: alua: port group 00 state A non-preferred
supports TOlUSNA
[49946.052514] sd 2:0:0:5: alua: port group 00 state A non-preferred
supports TOlUSNA
[49946.092895] sd 2:0:0:4: [sdl] Attached SCSI disk
[49946.093422] sd 2:0:0:6: alua: port group 00 state A non-preferred
supports TOlUSNA
[49953.156158] scsi host2: SRP abort called ***** Abort
[49953.187444] sd 2:0:0:5: [sdm] Attached SCSI disk
[49953.211545] BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
[49965.632850] PGD 0 P4D 0
[49965.644974] Oops: 0002 [#1] SMP PTI
[49965.661765] CPU: 11 PID: 2949 Comm: kworker/u64:0 Kdump: loaded
Tainted: G I 4.17.0+ #1
[49965.711026] Hardware name: HP ProLiant DL380 G7, BIOS P67 08/16/2015
[49965.742461] Workqueue: scsi_tmf_2 scmd_eh_abort_handler
[49965.770633] RIP: 0010:_raw_spin_lock_irqsave+0x1e/0x40
[49965.795410] Code: 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
53 9c 58 66 66 90 66 90 48 89 c3 fa 66 66 90 66 66 90 31 c0 ba 01 00 00
00 <f0> 0f b1 17 85 c0 75 05 48 89 d8 5b c3 89 c6 e8 d7 26 92 ff eb f2
[49965.892623] RSP: 0018:ffffb75e4789fdc0 EFLAGS: 00010046
[49965.920553] RAX: 0000000000000000 RBX: 0000000000000286 RCX:
0000000000000018
[49965.954952] RDX: 0000000000000001 RSI: 000000000000000a RDI:
0000000000000008
[49965.995180] RBP: 0000000000000000 R08: 0000000000000000 R09:
000000000000000a
[49966.033257] R10: 0000000000000000 R11: 0000000000000000 R12:
000000000000000a
[49966.073219] R13: ffff8d51f9041380 R14: 0000000000000000 R15:
ffff8d454df84d30
[49966.107885] FS: 0000000000000000(0000) GS:ffff8d52b3340000(0000)
knlGS:0000000000000000
[49966.150490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[49966.177976] CR2: 0000000000000008 CR3: 000000107300a005 CR4:
00000000000206e0
[49966.216606] Call Trace:
[49966.228353] complete+0x18/0x50
[49966.243410] scsi_end_request+0x95/0x1e0
[49966.263891] scsi_io_completion+0x1c1/0x680
[49966.286617] process_one_work+0x171/0x370
[49966.305850] worker_thread+0x49/0x3f0
[49966.323408] kthread+0xf8/0x130
[49966.341046] ? max_active_store+0x80/0x80
[49966.362901] ? kthread_bind+0x10/0x10
[49966.382485] ret_from_fork+0x35/0x40
Looks like a race in completion
Pull request off stack
struct request {
q = 0xffff8d51f979b0c0,
mq_ctx = 0xffffd752442e0600,
cpu = -1,
cmd_flags = 0,
rq_flags = 139456,
internal_tag = -1,
__data_len = 0,
tag = 56,
__sector = 8191008,
bio = 0x0,
biotail = 0xffff8d46acd9e700,
queuelist = {
next = 0xffff8d454df84c40,
prev = 0xffff8d454df84c40
},
struct gendisk {
major = 67,
first_minor = 64,
minors = 16,
disk_name =
"sdba\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000\000\000\000\000\000\000ba",
crash> scsi_device.sdev_state 0xffff8d52b1da4800
sdev_state = SDEV_RUNNING
crash> Scsi_Host.shost_state 0xffff8d466b969000
shost_state = SHOST_RUNNING
if (scsi_target(sdev)->single_lun ||
!list_empty(&sdev->host->starved_list))
kblockd_schedule_work(&sdev->requeue_work);
Have you seen this before, let me know what else you want from the dump
while I look further.
I have not tested for a while so not sure where this crept in or if its
even an issue for others.
Thanks
Laurence