[GIT PULL] final round of SCSI updates for the 4.14+ merge window

2017-11-22 Thread James Bottomley
Two basic fixes: one for the sparse problem with the blacklist flags
and another for a hang forever in bnx2i.

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-fixes

The short changelog is:

Chad Dupuis (1):
  scsi: bnx2fc: Fix hung task messages when a cleanup response is not 
received during abort

Hannes Reinecke (1):
  scsi: Use 'blist_flags_t' for scsi_devinfo flags

And the diffstat:

 drivers/scsi/bnx2fc/bnx2fc_io.c | 40 ++---
 drivers/scsi/scsi_devinfo.c | 18 +++
 drivers/scsi/scsi_priv.h| 15 +++--
 drivers/scsi/scsi_scan.c|  2 +-
 include/scsi/scsi_device.h  |  4 +++-
 include/scsi/scsi_devinfo.h | 50 -
 6 files changed, 78 insertions(+), 51 deletions(-)

With full diff below.

James

---

diff --git a/drivers/scsi/bnx2fc/bnx2fc_io.c b/drivers/scsi/bnx2fc/bnx2fc_io.c
index 5b6153f23f01..8e2f767147cb 100644
--- a/drivers/scsi/bnx2fc/bnx2fc_io.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_io.c
@@ -1084,24 +1084,35 @@ static int bnx2fc_abts_cleanup(struct bnx2fc_cmd 
*io_req)
 {
struct bnx2fc_rport *tgt = io_req->tgt;
int rc = SUCCESS;
+   unsigned int time_left;
 
io_req->wait_for_comp = 1;
bnx2fc_initiate_cleanup(io_req);
 
spin_unlock_bh(>tgt_lock);
 
-   wait_for_completion(_req->tm_done);
-
+   /*
+* Can't wait forever on cleanup response lest we let the SCSI error
+* handler wait forever
+*/
+   time_left = wait_for_completion_timeout(_req->tm_done,
+   BNX2FC_FW_TIMEOUT);
io_req->wait_for_comp = 0;
+   if (!time_left)
+   BNX2FC_IO_DBG(io_req, "%s(): Wait for cleanup timed out.\n",
+ __func__);
+
/*
-* release the reference taken in eh_abort to allow the
-* target to re-login after flushing IOs
+* Release reference held by SCSI command the cleanup completion
+* hits the BNX2FC_CLEANUP case in bnx2fc_process_cq_compl() and
+* thus the SCSI command is not returnedi by bnx2fc_scsi_done().
 */
kref_put(_req->refcount, bnx2fc_cmd_release);
 
spin_lock_bh(>tgt_lock);
return rc;
 }
+
 /**
  * bnx2fc_eh_abort - eh_abort_handler api to abort an outstanding
  * SCSI command
@@ -1118,6 +1129,7 @@ int bnx2fc_eh_abort(struct scsi_cmnd *sc_cmd)
struct fc_lport *lport;
struct bnx2fc_rport *tgt;
int rc;
+   unsigned int time_left;
 
rc = fc_block_scsi_eh(sc_cmd);
if (rc)
@@ -1194,6 +1206,11 @@ int bnx2fc_eh_abort(struct scsi_cmnd *sc_cmd)
if (cancel_delayed_work(_req->timeout_work))
kref_put(_req->refcount,
 bnx2fc_cmd_release); /* drop timer hold */
+   /*
+* We don't want to hold off the upper layer timer so simply
+* cleanup the command and return that I/O was successfully
+* aborted.
+*/
rc = bnx2fc_abts_cleanup(io_req);
/* This only occurs when an task abort was requested while ABTS
   is in progress.  Setting the IO_CLEANUP flag will skip the
@@ -1201,7 +1218,7 @@ int bnx2fc_eh_abort(struct scsi_cmnd *sc_cmd)
   was a result from the ABTS request rather than the CLEANUP
   request */
set_bit(BNX2FC_FLAG_IO_CLEANUP, _req->req_flags);
-   goto out;
+   goto done;
}
 
/* Cancel the current timer running on this io_req */
@@ -1221,7 +1238,11 @@ int bnx2fc_eh_abort(struct scsi_cmnd *sc_cmd)
}
spin_unlock_bh(>tgt_lock);
 
-   wait_for_completion(_req->tm_done);
+   /* Wait 2 * RA_TOV + 1 to be sure timeout function hasn't fired */
+   time_left = wait_for_completion_timeout(_req->tm_done,
+   (2 * rp->r_a_tov + 1) * HZ);
+   if (time_left)
+   BNX2FC_IO_DBG(io_req, "Timed out in eh_abort waiting for 
tm_done");
 
spin_lock_bh(>tgt_lock);
io_req->wait_for_comp = 0;
@@ -1233,8 +1254,12 @@ int bnx2fc_eh_abort(struct scsi_cmnd *sc_cmd)
/* Let the scsi-ml try to recover this command */
printk(KERN_ERR PFX "abort failed, xid = 0x%x\n",
   io_req->xid);
+   /*
+* Cleanup firmware residuals before returning control back
+* to SCSI ML.
+*/
rc = bnx2fc_abts_cleanup(io_req);
-   goto out;
+   goto done;
} else {
/*
 * We come here even when there was a race condition
@@ -1249,7 +1274,6 @@ int bnx2fc_eh_abort(struct scsi_cmnd *sc_cmd)
 done:
/* release the reference taken in eh_abort */

Re: [PATCH 1/1] qed: Add firmware 8.33.1.0

2017-11-22 Thread Ben Hutchings
On Wed, 2017-10-11 at 00:57 -0700, Rahul Verma wrote:
> The new qed firmware contains fixes to firmware and added
> support for new features,
> -Add UFP support and drop action support.
> -DCQCN support for unlimited number of QP
> -Add IP type to GFT filter profile.
> -Added new TCP function counters.
> -Support flow ID in aRFS flow.
> 
> Signed-off-by: Rahul Verma 
> ---
>  WHENCE  |   1 +
>  qed/qed_init_values_zipped-8.33.1.0.bin | Bin 0 -> 838612 bytes
>  2 files changed, 1 insertion(+)
>  create mode 100755 qed/qed_init_values_zipped-8.33.1.0.bin
[...]

Applied; sorry for the delay.

Ben.

-- 
Ben Hutchings
When in doubt, use brute force. - Ken Thompson



signature.asc
Description: This is a digitally signed message part


[PATCH] Ensure that the SCSI error handler gets woken up

2017-11-22 Thread Bart Van Assche
If scsi_eh_scmd_add() is called concurrently with scsi_host_queue_ready()
while shost->host_blocked > 0 then it can happen that neither function
wakes up the SCSI error handler. Fix this by making every function that
decreases the host_busy counter to wake up the error handler if necessary.

Reported-by: Pavel Tikhomirov 
Fixes: commit 746650160866 ("scsi: convert host_busy to atomic_t")
Signed-off-by: Bart Van Assche 
Cc: Konstantin Khorenko 
Cc: Stuart Hayes 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
Cc: 
---
 drivers/scsi/scsi_error.c |  3 ++-
 drivers/scsi/scsi_lib.c   | 22 ++
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 5e89049e9b4e..f7f014c755d7 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -61,9 +61,10 @@ static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
 static int scsi_try_to_abort_cmd(struct scsi_host_template *,
 struct scsi_cmnd *);
 
-/* called with shost->host_lock held */
 void scsi_eh_wakeup(struct Scsi_Host *shost)
 {
+   lockdep_assert_held(shost->host_lock);
+
if (atomic_read(>host_busy) == shost->host_failed) {
trace_scsi_eh_wakeup(shost);
wake_up_process(shost->ehandler);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1e05e1885ac8..abd37d77af2d 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -318,22 +318,28 @@ static void scsi_init_cmd_errh(struct scsi_cmnd *cmd)
cmd->cmd_len = scsi_command_size(cmd->cmnd);
 }
 
-void scsi_device_unbusy(struct scsi_device *sdev)
+static void scsi_dec_host_busy(struct Scsi_Host *shost)
 {
-   struct Scsi_Host *shost = sdev->host;
-   struct scsi_target *starget = scsi_target(sdev);
unsigned long flags;
 
atomic_dec(>host_busy);
-   if (starget->can_queue > 0)
-   atomic_dec(>target_busy);
-
if (unlikely(scsi_host_in_recovery(shost) &&
 (shost->host_failed || shost->host_eh_scheduled))) {
spin_lock_irqsave(shost->host_lock, flags);
scsi_eh_wakeup(shost);
spin_unlock_irqrestore(shost->host_lock, flags);
}
+}
+
+void scsi_device_unbusy(struct scsi_device *sdev)
+{
+   struct Scsi_Host *shost = sdev->host;
+   struct scsi_target *starget = scsi_target(sdev);
+
+   scsi_dec_host_busy(shost);
+
+   if (starget->can_queue > 0)
+   atomic_dec(>target_busy);
 
atomic_dec(>device_busy);
 }
@@ -1532,7 +1538,7 @@ static inline int scsi_host_queue_ready(struct 
request_queue *q,
list_add_tail(>starved_entry, >starved_list);
spin_unlock_irq(shost->host_lock);
 out_dec:
-   atomic_dec(>host_busy);
+   scsi_dec_host_busy(shost);
return 0;
 }
 
@@ -2020,7 +2026,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx 
*hctx,
return BLK_STS_OK;
 
 out_dec_host_busy:
-   atomic_dec(>host_busy);
+   scsi_dec_host_busy(shost);
 out_dec_target_busy:
if (scsi_target(sdev)->can_queue > 0)
atomic_dec(_target(sdev)->target_busy);
-- 
2.15.0



mpt3sas_cm2 attempting host reset for a 1 faulted disk

2017-11-22 Thread Özkan Göksu

I have 5 jbod and i reach each one 2 sas cable with 5 LSI 3008.

Last day my cluster imported zfs pool to controller 2.
I was have 1 faulted disk in this pool already "sdhr".
When the pool imported on controller2 few sec or minute later I lost 1 jbod 
because mpt3sas_cm2 reset LSI card + expanders. After few sec i was able to use 
the jbod again.

But why? I was only have 1 faulted disk!.
If mpt3 will reset at every import for a faulted disk "which i have 390 disk, i 
will have 1-2 faulted disk its normal" we have a big problem! 

LSISAS3008: FWVersion(15.00.00.00), ChipRevision(0x02), BiosVersion(08.35.00.00

My Kernel=  Linux 4.13.11-1-ARCH #1 SMP PREEMPT Thu Nov 2 10:25:56 CET 2017 
x86_64 GNU/Linux

Kernel log= 
https://gist.github.com/morphinz/26812c7025c5125b4fa60e300ec76bd0

BTW: I lost index 2 and "sdhr" in that jbod:

Re: [PATCH] scsi: ufs: ufshcd: fix potential NULL pointer dereference in ufshcd_config_vreg

2017-11-22 Thread Gustavo A. R. Silva


On 11/21/2017 10:01 PM, Martin K. Petersen wrote:

Gustavo A.,


_vreg_ is being dereferenced before it is null checked, hence there is a
potential null pointer dereference.

Fix this by moving the pointer dereference after _vreg_ has been null
checked.

Applied to 4.15/scsi-fixes, thank you!



Glad to help. :)

Thanks
--
Gustavo A. R. Silva


RE: [PATCH] scsi: ufs: Fix Runtime PM

2017-11-22 Thread Potomski, MichalX
On 2017-11-21 20:45, Subhash Jadavani wrote:
> On 2017-11-13 01:14, Michal Potomski wrote:
> > From: Michał Potomski 
> >
> > Recent testing of Runtime PM for UFS has shown it's not working as
> > intended. To acheive fully working Runtime PM, first we have to put
> > back scsi_device autopm reference counter.
> >
> > Existing implementation was prone to races and not working for
> > tranfsers to sleeping devices. This commit aims to fix this to acheive
> > fully working environment. Due to runtime PM being previously disabled
> > by not putting scsi_device autopm counter, the Runtime PM is set to be
> > forbidden as default, to not cause problems with hosts and devices,
> > which do not fully support different Device and Link states.
> >
> > It can be still enabled through sysFS power/control attribute.
> >
> > Changes since v1:
> >  - Fix compilation error for certain kernel configs,
> >  - Move pm_mark_last_busy(), only to relevant context.
> >
> > Signed-off-by: Michał Potomski 
> > ---
> >  drivers/scsi/ufs/ufshcd-pci.c |  4 ++-
> >  drivers/scsi/ufs/ufshcd.c | 64
> > +++
> >  2 files changed, 56 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/scsi/ufs/ufshcd-pci.c
> > b/drivers/scsi/ufs/ufshcd-pci.c index 925b0ec7ec54..4fffb1123876
> > 100644
> > --- a/drivers/scsi/ufs/ufshcd-pci.c
> > +++ b/drivers/scsi/ufs/ufshcd-pci.c
> > @@ -184,8 +184,10 @@ ufshcd_pci_probe(struct pci_dev *pdev, const
> > struct pci_device_id *id)
> > }
> >
> > pci_set_drvdata(pdev, hba);
> > +   pm_runtime_set_autosuspend_delay(>dev, 3000);
> > +   pm_runtime_use_autosuspend(>dev);
> > pm_runtime_put_noidle(>dev);
> > -   pm_runtime_allow(>dev);
> > +   pm_runtime_forbid(>dev);
> >
> > return 0;
> >  }
> > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > index 011c3369082c..7ff7f51afe26 100644
> > --- a/drivers/scsi/ufs/ufshcd.c
> > +++ b/drivers/scsi/ufs/ufshcd.c
> > @@ -240,6 +240,37 @@ static inline bool ufshcd_valid_tag(struct
> > ufs_hba *hba, int tag)
> > return tag >= 0 && tag < hba->nutrs;  }
> >
> > +#ifdef CONFIG_PM
> > +static void ufshcd_pm_runtime_get(struct device *dev) {
> > +   // Don't trigger PM events while in transition states
> > +   if (dev->power.runtime_status == RPM_RESUMING ||
> > +   dev->power.runtime_status == RPM_SUSPENDING)
> 
> Why do we need to check this? Isn't this check race condition prone?
We need to check this, because we might end up in triggering resume, while
suspending or resuming, which will cause deadlock. This is especially important
in ufshcd_queuecommand directive, through which we pass e.g. START/STOP Unit
SCSI command, when resuming or suspending.
> > +   pm_runtime_get_noresume(dev);
> > +   else
> > +   pm_runtime_get_sync(dev);
> > +}
> > +
> > +static void ufshcd_pm_runtime_put(struct device *dev) {
> > +   // Don't trigger PM events while in transition states
> > +   if (dev->power.runtime_status == RPM_RESUMING ||
> > +   dev->power.runtime_status == RPM_SUSPENDING)
> > +   pm_runtime_put_noidle(dev);
> > +   else {
> > +   pm_runtime_mark_last_busy(dev);
> > +   pm_runtime_put_autosuspend(dev);
> > +   }
> > +}
> > +#else
> > +static void ufshcd_pm_runtime_get(struct device *dev) { } static void
> > +ufshcd_pm_runtime_put(struct device *dev) { } #endif
> > +
> >  static inline int ufshcd_enable_irq(struct ufs_hba *hba)  {
> > int ret = 0;
> > @@ -1345,7 +1376,7 @@ static ssize_t
> > ufshcd_clkscale_enable_store(struct device *dev,
> > if (value == hba->clk_scaling.is_allowed)
> > goto out;
> >
> > -   pm_runtime_get_sync(hba->dev);
> > +   ufshcd_pm_runtime_get(hba->dev);
> > ufshcd_hold(hba, false);
> >
> > cancel_work_sync(>clk_scaling.suspend_work);
> > @@ -1364,7 +1395,7 @@ static ssize_t
> > ufshcd_clkscale_enable_store(struct device *dev,
> > }
> >
> > ufshcd_release(hba);
> > -   pm_runtime_put_sync(hba->dev);
> > +   ufshcd_pm_runtime_put(hba->dev);
> >  out:
> > return count;
> >  }
> > @@ -1948,6 +1979,7 @@ ufshcd_send_uic_cmd(struct ufs_hba *hba, struct
> > uic_command *uic_cmd)
> > unsigned long flags;
> >
> > ufshcd_hold(hba, false);
> > +   ufshcd_pm_runtime_get(hba->dev);
> > mutex_lock(>uic_cmd_mutex);
> > ufshcd_add_delay_before_dme_cmd(hba);
> >
> > @@ -1959,6 +1991,7 @@ ufshcd_send_uic_cmd(struct ufs_hba *hba, struct
> > uic_command *uic_cmd)
> >
> > mutex_unlock(>uic_cmd_mutex);
> >
> > +   ufshcd_pm_runtime_put(hba->dev);
> > ufshcd_release(hba);
> > return ret;
> >  }
> > @@ -2345,6 +2378,7 @@ static int ufshcd_queuecommand(struct Scsi_Host
> > *host, struct scsi_cmnd *cmd)
> > clear_bit_unlock(tag, >lrb_in_use);
> > goto out;
> > }
> > +   ufshcd_pm_runtime_get(hba->dev);
> 
> 
> why do we need to hold PM reference count here? I thought as ufs host is
> 

[PATCH] scsi: lpfc: Use after free in lpfc_rq_buf_free()

2017-11-22 Thread Dan Carpenter
The error message dereferences "rqb_entry" so we need to print it first
and then free the buffer.

Fixes: 6c621a2229b0 ("scsi: lpfc: Separate NVMET RQ buffer posting from IO 
resources SGL/iocbq/context")
Signed-off-by: Dan Carpenter 

diff --git a/drivers/scsi/lpfc/lpfc_mem.c b/drivers/scsi/lpfc/lpfc_mem.c
index 56faeb049b4a..87c08ff37ddd 100644
--- a/drivers/scsi/lpfc/lpfc_mem.c
+++ b/drivers/scsi/lpfc/lpfc_mem.c
@@ -753,12 +753,12 @@ lpfc_rq_buf_free(struct lpfc_hba *phba, struct 
lpfc_dmabuf *mp)
drqe.address_hi = putPaddrHigh(rqb_entry->dbuf.phys);
rc = lpfc_sli4_rq_put(rqb_entry->hrq, rqb_entry->drq, , );
if (rc < 0) {
-   (rqbp->rqb_free_buffer)(phba, rqb_entry);
lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
"6409 Cannot post to RQ %d: %x %x\n",
rqb_entry->hrq->queue_id,
rqb_entry->hrq->host_index,
rqb_entry->hrq->hba_index);
+   (rqbp->rqb_free_buffer)(phba, rqb_entry);
} else {
list_add_tail(_entry->hbuf.list, >rqb_buffer_list);
rqbp->buffer_count++;