Thanks for reviewing.
On 02/12/2018 04:28 AM, Sreekanth Reddy wrote:
Instead of returning the scmd with DID_NO_CONNECT in TM path, can we wait
for some time (10 seconds) in shutdown/unload path for the outstanding
commands to complete and even then [if] the scmds are outstanding then return
all the outstanding scmds with DID_NO_CONNECT in the shutdown/unload path
itself as shown below,
Apparently there is a problem with that.
That is not enough for TM commands; it's only for SCSI IO commands.
The TM commands use the high-priority queue, and SCSI IO commands use
the SCSI IO queue. But this patch only waits for and return commands
in the latter:
> + mpt3sas_wait_for_commands_to_complete(ioc);
> + _scsih_flush_running_cmds(ioc);
They only loop up until 'ioc->scsiio_depth', but TM commands get SMIDs
above that [see mpt3sas_base_free_smid() and mpt3sas_scsih_issue_tm()].
So, there's an exposure for abort/reset/other TM commands. For example,
the SCSI IO commands finish quickly in wait_for_commands_to_complete(),
then the shutdown function continues, disables interrupts, and release
pointers/memory. Then, an outstanding abort/reset command is finished,
its completion interrupt is not serviced, and the SCSI EH for it kicks
in/continues, and so it escalates up, and hit that oops in host reset.
Is there any problem with the patch that I proposed?
It seems okay to have another check in error path (not hot/performance
path), and that check is already used in many other points in the code,
for the same reasons (exit early before the code attempts to use stuff
that might be released).
Mauricio Faria de Oliveira
IBM Linux Technology Center