On Mon, 2012-08-27 at 12:13 -0400, John Drescher wrote:
> >> I have bisected it down to the following patch:
> >>
> >> Bisecting: 0 revisions left to test after this (roughly 0 steps)
> >> [10f8d5b86743b33d841a175303e2bf67fd620f42] SCSI: fix hot unplug vs
> >> async scan race
> >>
> >> It appears this patch caused the bad behavior although I have not
> >> tested that yet. I am rebuilding the array (takes ~2 hours) from the
> >> previous good bisect.
> >>
> 
> Confirmed. This patch appears to cause the bug in my test setup.
> 
> [  339.406778] BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202]
[..]
> [  339.415268]  [<ffffffff8141782a>] scsi_remove_target+0xda/0x1f0

I wonder if we are preventing scsi_device_dev_release_usercontext() from
making forward progress?

...the attached patch should confirm this or give more info otherwise.

--
Dan

scsi_remove_target: debug softlockup

From: Dan Williams <d...@fb.com>

dump more info in the case where we get stuck trying to remove a device.
---
 drivers/scsi/scsi_sysfs.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 093d4f6..011f8ee 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1032,8 +1032,11 @@ void scsi_remove_target(struct device *dev)
 {
 	struct Scsi_Host *shost = dev_to_shost(dev->parent);
 	struct scsi_target *starget, *found;
+	struct scsi_target *found_log[3];
 	unsigned long flags;
 
+	memset(found_log, 0, sizeof(found_log));
+
  restart:
 	found = NULL;
 	spin_lock_irqsave(shost->host_lock, flags);
@@ -1041,8 +1044,24 @@ void scsi_remove_target(struct device *dev)
 		if (starget->state == STARGET_DEL)
 			continue;
 		if (starget->dev.parent == dev || &starget->dev == dev) {
+			int i;
+
 			found = starget;
 			found->reap_ref++;
+			for (i = 0; i < ARRAY_SIZE(found_log); i++)
+				if (!found_log[i]) {
+					found_log[i] = found;
+					break;
+				} else if (found_log[i] == found) {
+					struct scsi_device *sdev = NULL;
+
+					if (!list_empty(&found->devices))
+						sdev = list_entry(found->devices.next, typeof(*sdev), same_target_siblings);
+					pr_err_once("%s[%d]: reap %d:%d state: %d reap: %d dev_del: %d\n",
+						    __func__, i, found->channel, found->id,
+						    found->state, found->reap_ref,
+						    sdev ? work_busy(&sdev->ew.work) ? 2 : 1 : 0);
+				}
 			break;
 		}
 	}

Reply via email to