** Description changed:

  [Impact]
  When a SATA device, attached to a SAS controller, begins generating errors 
(e.g. device failing, or someone yanked it), the SAS error handling will 
complete, but may leave zombie ATA commands that never get properly 
processed/freed. This can cause some ugly messages on the console, and 
eventually leads to a system hang-up.
  
-     WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037
-     ata_eh_finish+0xb4/0xcc
-     CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G     W  OE 4.14.0#1
-     ......
-     Call trace:
-     [<ffff0000088b7bd0>] ata_eh_finish+0xb4/0xcc
-     [<ffff0000088b8420>] ata_do_eh+0xc4/0xd8
-     [<ffff0000088b8478>] ata_std_error_handler+0x44/0x8c
-     [<ffff0000088b8068>] ata_scsi_port_error_handler+0x480/0x694
-     [<ffff000008875fc4>] async_sas_ata_eh+0x4c/0x80
-     [<ffff0000080f6be8>] async_run_entry_fn+0x4c/0x170
-     [<ffff0000080ebd70>] process_one_work+0x144/0x390
-     [<ffff0000080ec100>] worker_thread+0x144/0x418
-     [<ffff0000080f2c98>] kthread+0x10c/0x138
-     [<ffff0000080855dc>] ret_from_fork+0x10/0x18
+     WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037
+     ata_eh_finish+0xb4/0xcc
+     CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G     W  OE 4.14.0#1
+     ......
+     Call trace:
+     [<ffff0000088b7bd0>] ata_eh_finish+0xb4/0xcc
+     [<ffff0000088b8420>] ata_do_eh+0xc4/0xd8
+     [<ffff0000088b8478>] ata_std_error_handler+0x44/0x8c
+     [<ffff0000088b8068>] ata_scsi_port_error_handler+0x480/0x694
+     [<ffff000008875fc4>] async_sas_ata_eh+0x4c/0x80
+     [<ffff0000080f6be8>] async_run_entry_fn+0x4c/0x170
+     [<ffff0000080ebd70>] process_one_work+0x144/0x390
+     [<ffff0000080ec100>] worker_thread+0x144/0x418
+     [<ffff0000080f2c98>] kthread+0x10c/0x138
+     [<ffff0000080855dc>] ret_from_fork+0x10/0x18
  
  [Test Case]
  I don't have a reliable reproducer for this, but one possible test is to yank 
an active/hotpluggable SATA disk from its controller and see if the above 
symptoms occur.
  
+ [Fix] 
+ The solution here is to call into libata to have it process the remaining 
commands, allowing us to free up the zombie commands, preventing the leak and 
eventual starvation.
+ 
  [Regression Risk]
  This is a clean cherry-pick from upstream, so any regressions should have 
upstream support. As of this writing, there are no changesets in linux-next 
marked as Fixing this commit, implying that upstream has not yet found/fixed 
any bugs related to it.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1768971

Title:
  Warnings/hang during error handling of SATA disks on SAS controller

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  [Impact]
  When a SATA device, attached to a SAS controller, begins generating errors 
(e.g. device failing, or someone yanked it), the SAS error handling will 
complete, but may leave zombie ATA commands that never get properly 
processed/freed. This can cause some ugly messages on the console, and 
eventually leads to a system hang-up.

      WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037
      ata_eh_finish+0xb4/0xcc
      CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G     W  OE 4.14.0#1
      ......
      Call trace:
      [<ffff0000088b7bd0>] ata_eh_finish+0xb4/0xcc
      [<ffff0000088b8420>] ata_do_eh+0xc4/0xd8
      [<ffff0000088b8478>] ata_std_error_handler+0x44/0x8c
      [<ffff0000088b8068>] ata_scsi_port_error_handler+0x480/0x694
      [<ffff000008875fc4>] async_sas_ata_eh+0x4c/0x80
      [<ffff0000080f6be8>] async_run_entry_fn+0x4c/0x170
      [<ffff0000080ebd70>] process_one_work+0x144/0x390
      [<ffff0000080ec100>] worker_thread+0x144/0x418
      [<ffff0000080f2c98>] kthread+0x10c/0x138
      [<ffff0000080855dc>] ret_from_fork+0x10/0x18

  [Test Case]
  I don't have a reliable reproducer for this, but one possible test is to yank 
an active/hotpluggable SATA disk from its controller and see if the above 
symptoms occur.

  [Fix] 
  The solution here is to call into libata to have it process the remaining 
commands, allowing us to free up the zombie commands, preventing the leak and 
eventual starvation.

  [Regression Risk]
  This is a clean cherry-pick from upstream, so any regressions should have 
upstream support. As of this writing, there are no changesets in linux-next 
marked as Fixing this commit, implying that upstream has not yet found/fixed 
any bugs related to it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1768971/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to