在 2018/2/27 22:57, Bart Van Assche 写道:
On Tue, 2018-02-27 at 15:09 +0800, chenxiang (M) wrote:
在 2018/2/26 23:25, Bart Van Assche 写道:
On Mon, 2018-02-26 at 17:37 +0800, chenxiang (M) wrote:
When i have a test on kernel 4.16-rc1, find a issue: running IO on SATA disk, 
then disable the disk through
sysfs interface(echo 0 > /sys/class/sas_phy/phy-1:0:0/enable), IO will hang and 
never enter SCSI EH. The issue
appears every time.

I add some prints on code and find that  those IOs will be timeout after 30s, 
and they all enter
function scsi_eh_scmd_add, but only some of them can enter function 
scsi_eh_inc_host_failed. So it will never
enter SCSI EH.  I suspect it is related to the patch ("commit 3bd6f43f5cb371" 
scsi: core: Ensure that the
SCSI error handler gets woken up ). Please have a check.
Hello chenxiang,

Had you already noticed patch "[PATCH v2] Avoid that ATA error handling can
trigger a kernel hang or oops"? If not, can you apply that patch to your
kernel and verify whether it fixes this behavior? See also
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg71189.html or
https://patchwork.kernel.org/patch/10236213/.
After applied your patch, the issue i reported seems be solved.
Thanks for having testing that patch!

But when i have long time test(disable/enable disk when running IO) on
the testcase, Null pointer occurs.
It seems not related to current issue but i am not sure.
I ran the testcase for long time before in kernel 4.15-rc5, and it was okay.

Part of log is as follows, and i add attachment of log in the email :

[  485.716578] pc : blk_abort_request+0x14/0x68
(+Tejun)

Hello chenxiang,

Please check whether the following patch fixes the kernel crash you ran into:

https://marc.info/?l=linux-block&m=151895951207014

It seems the patch is for block mq, but the issue i encount is under block legacy as CONFIG_SCSI_MQ_DEFAULT is not enabled.


Thanks,

Bart.


Reply via email to