date:20170404

Re: [PATCHv4 8/8] scsi: inline command aborts

2017-04-04 Thread kbuild test robot

Hi Hannes,

[auto build test WARNING on scsi/for-next]
[also build test WARNING on v4.11-rc5 next-20170404]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Hannes-Reinecke/scsi_error-count-medium-access-timeout-only-once-per-EH-run/20170405-060357
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

>> drivers/scsi/scsi_error.c:120: warning: No description found for parameter 
>> 'scmd'
>> drivers/scsi/scsi_error.c:120: warning: Excess function parameter 'work' 
>> description in 'scmd_eh_abort_handler'
   drivers/scsi/constants.c:1: warning: no structured comments found

vim +/scmd +120 drivers/scsi/scsi_error.c

bb3b621a Ren Mingxin 2013-11-11  104 * if eh_deadline has been set 
to 'off' during the
76ad3e59 Hannes Reinecke 2013-11-11  105 * time_before call.
76ad3e59 Hannes Reinecke 2013-11-11  106 */
76ad3e59 Hannes Reinecke 2013-11-11  107if (time_before(jiffies, 
shost->last_reset + shost->eh_deadline) &&
bb3b621a Ren Mingxin 2013-11-11  108shost->eh_deadline > -1)
b4562022 Hannes Reinecke 2013-10-23  109return 0;
b4562022 Hannes Reinecke 2013-10-23  110  
b4562022 Hannes Reinecke 2013-10-23  111return 1;
b4562022 Hannes Reinecke 2013-10-23  112  }
b4562022 Hannes Reinecke 2013-10-23  113  
^1da177e Linus Torvalds  2005-04-16  114  /**
e494f6a7 Hannes Reinecke 2013-11-11  115   * scmd_eh_abort_handler - Handle 
command aborts
e494f6a7 Hannes Reinecke 2013-11-11  116   * @work: command to be aborted.
e494f6a7 Hannes Reinecke 2013-11-11  117   */
07311267 Hannes Reinecke 2017-04-04  118  int
07311267 Hannes Reinecke 2017-04-04  119  scmd_eh_abort_handler(struct 
scsi_cmnd *scmd)
e494f6a7 Hannes Reinecke 2013-11-11 @120  {
e494f6a7 Hannes Reinecke 2013-11-11  121struct scsi_device *sdev = 
scmd->device;
e494f6a7 Hannes Reinecke 2013-11-11  122int rtn;
e494f6a7 Hannes Reinecke 2013-11-11  123  
e494f6a7 Hannes Reinecke 2013-11-11  124if 
(scsi_host_eh_past_deadline(sdev->host)) {
e494f6a7 Hannes Reinecke 2013-11-11  125
SCSI_LOG_ERROR_RECOVERY(3,
e494f6a7 Hannes Reinecke 2013-11-11  126
scmd_printk(KERN_INFO, scmd,
470613b4 Hannes Reinecke 2015-01-08  127"eh 
timeout, not aborting\n"));
07311267 Hannes Reinecke 2017-04-04  128return FAILED;

:: The code at line 120 was first introduced by commit
:: e494f6a728394ab0df194342549ee20e6f0752df [SCSI] improved eh timeout 
handler

:: TO: Hannes Reinecke <h...@suse.de>
:: CC: James Bottomley <jbottom...@parallels.com>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH] sas: remove sas_domain_release_transport

2017-04-04 Thread Martin K. Petersen

Johannes Thumshirn  writes:

Johannes,

> sas_domain_release_transport is unused since at least v3.13, remove
> it.

Applied to 4.12/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH 2/2] qla2xxx: Fix typo in driver

2017-04-04 Thread Martin K. Petersen

Himanshu Madhani  writes:

> From: Milan P Gandhi 
>
> Signed-off-by: Milan P Gandhi 
> Signed-off-by: Himanshu Madhani 

Applied to 4.12/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH 1/2] qla2xxx: Add fix to read correct register value for ISP82xx.

2017-04-04 Thread Martin K. Petersen

Himanshu Madhani  writes:

> Add fix to read correct register value for ISP82xx, during check for
> register disconnect.ISP82xx has different base register.

Applied to 4.11/scsi-fixes.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: problem with discard granularity in sd

2017-04-04 Thread Martin K. Petersen

David Buckley  writes:

David,

> They result in discard granularity being forced to logical block size
> if the disk reports LBPRZ is enabled (which the netapp luns do).

Block zeroing and unmapping are currently sharing some plumbing and that
has lead to some compromises. In this case a bias towards ensuring data
integrity for zeroing at the expense of not aligning unmap requests.

Christoph has worked on separating those two functions. His code is
currently under review.

> I'm not sure of the implications of either the netapp changes, though
> reporting 4k logical blocks seems potential as this is supported in
> newer OS at least.

Yes, but it may break legacy applications that assume a 512-byte logical
block size.

> The sd change potentially would at least partially undo the patches
> referenced above.  But it would seem that (assuming an aligned
> filesystem with 4k blocks and minimum_io_size=4096) there is no
> possibility of a partial block discard or advantage to sending the
> discard requests in 512 blocks?

The unmap granularity inside a device is often much, much bigger than
4K. So aligning to that probably won't make a difference. And it's
imperative to filesystems that zeroing works at logical block size
granularity.

The expected behavior for a device is that it unmaps whichever full
unmap granularity chunks are described by a received request. And then
explicitly zeroes any partial chunks at the head and tail. So I am
surprised you see no reclamation whatsoever.

With the impending zero/unmap separation things might fare better. But
I'd still like to understand the behavior you observe. Please provide
the output of:

sg_vpd -p lbpv /dev/sdN
sg_vpd -p bl /dev/sdN

for one of the LUNs and I'll take a look.

Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] sd: close hole in > 2T device rejection when !CONFIG_LBDAF

2017-04-04 Thread Bart Van Assche

On Tue, 2017-04-04 at 19:35 -0400, Martin K. Petersen wrote:
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index fb9b4d29af0b..6084c415c070 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -2102,6 +2102,16 @@ static void read_capacity_error(struct scsi_disk 
> *sdkp, struct scsi_device *sdp,
>  
>  #define READ_CAPACITY_RETRIES_ON_RESET   10
>  
> +static bool sd_addressable_capacity(u64 lba, unsigned int sector_size)
> +{
> + u64 last_sector = lba + 1ULL << ilog2(sector_size) - 9;
> +
> + if (sizeof(sector_t) == 4 && last_sector > 0xULL)
> + return false;
> +
> + return true;
> +}

Hello Martin,

How about replacing 0xULL with U32_MAX, adding parentheses in the
last_sector computation to make clear that + and - have precedence over <<
and adding a comment above sd_addressable_capacity() that explains its
purpose and also that the shift operation must not be replaced with a call
to logical_to_sectors()? Otherwise this patch looks fine to me.

Thanks,

Bart.

Re: [PATCH] scsi: advansys: fix uninitialized data access

2017-04-04 Thread Martin K. Petersen

Arnd Bergmann  writes:

Arnd,

> drivers/scsi/advansys.c: In function 'AscMsgOutSDTR':
> drivers/scsi/advansys.c:3860:26: error: '*((void *)_buf+5)' may be used 
> uninitialized in this function [-Werror=maybe-uninitialized]
>  ((ushort)s_buffer[i + 1] << 8) | s_buffer[i]);
>   ^
> drivers/scsi/advansys.c:3860:26: error: '*((void *)_buf+7)' may be used 
> uninitialized in this function [-Werror=maybe-uninitialized]
> drivers/scsi/advansys.c:3860:26: error: '*((void *)_buf+5)' may be used 
> uninitialized in this function [-Werror=maybe-uninitialized]
> drivers/scsi/advansys.c:3860:26: error: '*((void *)_buf+7)' may be used 
> uninitialized in this function [-Werror=maybe-uninitialized]
>
> The code has existed in this exact form at least since v2.6.12, and the
> warning seems correct. This uses named initializers to ensure we initialize
> all members of the structure.
>   (uchar *)_buf,

Applied to 4.12/scsi-queue, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] qedf: Fix crash due to unsolicited FIP VLAN response.

2017-04-04 Thread Martin K. Petersen

"Dupuis, Chad"  writes:

Chad,

> We need to initialize qedf->fipvlan_compl in __qedf_probe so that if
> we receive an unsolicited FIP VLAN response, the system doesn't crash
> due to trying to complete an uninitialized completion.
>
> Also add a check to see if there are any waiters on the completion so
> we don't inadvertantly kick start the discovery process due to the
> unsolicited frame.

Applied to 4.11/scsi-fixes.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] sd: close hole in > 2T device rejection when !CONFIG_LBDAF

2017-04-04 Thread Martin K. Petersen

Bart Van Assche  writes:

Hi Bart,

> Sorry but I still don't understand why the two checks are
> different. How about the (untested) patch below? The approach below
> avoids that the check is duplicated and - at least in my opinion -
> results in code that is easier to read.

Just tripped over this issue in connection with something else. However,
I had to make a few passes to convince myself that your proposed fix was
correct. How about something like the following?

Martin

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index fb9b4d29af0b..6084c415c070 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2102,6 +2102,16 @@ static void read_capacity_error(struct scsi_disk *sdkp, 
struct scsi_device *sdp,
 
 #define READ_CAPACITY_RETRIES_ON_RESET 10
 
+static bool sd_addressable_capacity(u64 lba, unsigned int sector_size)
+{
+   u64 last_sector = lba + 1ULL << ilog2(sector_size) - 9;
+
+   if (sizeof(sector_t) == 4 && last_sector > 0xULL)
+   return false;
+
+   return true;
+}
+
 static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
unsigned char *buffer)
 {
@@ -2167,7 +2177,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, 
struct scsi_device *sdp,
return -ENODEV;
}
 
-   if ((sizeof(sdkp->capacity) == 4) && (lba >= 0xULL)) {
+   if (!sd_addressable_capacity(lba, sector_size)) {
sd_printk(KERN_ERR, sdkp, "Too big for this kernel. Use a "
"kernel compiled with support for large block "
"devices.\n");
@@ -2256,7 +2266,7 @@ static int read_capacity_10(struct scsi_disk *sdkp, 
struct scsi_device *sdp,
return sector_size;
}
 
-   if ((sizeof(sdkp->capacity) == 4) && (lba == 0x)) {
+   if (!sd_addressable_capacity(lba, sector_size)) {
sd_printk(KERN_ERR, sdkp, "Too big for this kernel. Use a "
"kernel compiled with support for large block "
"devices.\n");

Re: [PATCH] scsi: ses: don't get power status of SES device slot on probe

2017-04-04 Thread Dan Williams

On Wed, Mar 29, 2017 at 6:02 PM, Mauricio Faria de Oliveira
 wrote:
> The commit 08024885a2a3 ("ses: Add power_status to SES device slot")
> introduced the 'power_status' attribute to enclosure components and
> the associated callbacks.
>
> There are 2 callbacks available to get the power status of a device:
> 1) ses_get_power_status() for 'struct enclosure_component_callbacks'
> 2) get_component_power_status() for the sysfs device attribute
> (these are available for kernel-space and user-space, respectively.)
>
> However, despite both methods being available to get power status
> on demand, that commit also introduced a call to get power status
> in ses_enclosure_data_process().
>
> This dramatically increased the total probe time for SCSI devices
> on larger configurations, because ses_enclosure_data_process() is
> called several times during the SCSI devices probe and loops over
> the component devices (but that is another problem, another patch).
>
> That results in a tremendous continuous hammering of SCSI Receive
> Diagnostics commands to the enclosure-services device, which does
> delay the total probe time for the SCSI devices __significantly__:
>
>   Originally, ~34 minutes on a system attached to ~170 disks:
>
> [ 9214.490703] mpt3sas version 13.100.00.00 loaded
> ...
> [11256.580231] scsi 17:0:177:0: qdepth(16), tagged(1), simple(0),
>ordered(0), scsi_level(6), cmd_que(1)
>
>   With this patch, it decreased to ~2.5 minutes -- a 13.6x faster
>
> [ 1002.992533] mpt3sas version 13.100.00.00 loaded
> ...
> [ 1151.978831] scsi 11:0:177:0: qdepth(16), tagged(1), simple(0),
>ordered(0), scsi_level(6), cmd_que(1)
>
> Another downside is that the SCSI device class lock is held during
> the executions of ses_enclosure_data_process() since it's a callee
> of ses_intf_add(), which is executed under that lock in device_add().
> This ends up blocking the parallel SCSI scan functionality because
> device_add() depends on that lock, and also the removal of devices
> via device_del() (e.g., during an enclosure reboot).
>
> Back to the commit discussion.. on the ses_get_power_status() call
> introduced in ses_enclosure_data_process(): impact of removing it.
>
> That may possibly be in place to initialize the power status value
> on device probe.  However, those 2 functions available to retrieve
> that value _do_ automatically refresh/update it.  So the potential
> benefit would be a direct access of the 'power_status' field which
> does not use the callbacks...
>
> But the only reader of 'struct enclosure_component::power_status'
> is the get_component_power_status() callback for sysfs attribute,
> and it _does_ check for and call the .get_power_status callback,
> (which indeed is defined and implemented by that commit), so the
> power status value is, again, automatically updated.
>
> So, the remaining potential for a direct/non-callback access to
> the power_status attribute would be out-of-tree modules -- well,
> for those, if they are for whatever reason interested in values
> that are set during device probe and not up-to-date by the time
> they need it.. well, that would be curious.
>
> So, for out-of-tree modules and people still interested in this
> behavior (in case I might have missed something about it, other
> than the total probe time impact): let's add a module parameter
> to keep that check turned on (disabled by default).
>
> Also, to handle that more properly, set the initial power state
> value to '-1' (i.e., uninitialized) instead of '1' (power 'on'),
> and check for it in that callback which may do an direct access
> to the field value _if_ a callback function is not defined.
>
> Signed-off-by: Mauricio Faria de Oliveira 
> Fixes: 08024885a2a3 ("ses: Add power_status to SES device slot")
> ---
>  drivers/misc/enclosure.c | 6 +-
>  drivers/scsi/ses.c   | 9 -
>  2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
> index 65fed7146e9b..d3a8a13c4247 100644
> --- a/drivers/misc/enclosure.c
> +++ b/drivers/misc/enclosure.c
> @@ -148,7 +148,7 @@ struct enclosure_device *
> for (i = 0; i < components; i++) {
> edev->component[i].number = -1;
> edev->component[i].slot = -1;
> -   edev->component[i].power_status = 1;
> +   edev->component[i].power_status = -1;
> }
>
> mutex_lock(_list_lock);
> @@ -594,6 +594,10 @@ static ssize_t get_component_power_status(struct device 
> *cdev,
>
> if (edev->cb->get_power_status)
> edev->cb->get_power_status(edev, ecomp);
> +
> +   if (ecomp->power_status == -1)
> +   return -EINVAL;
> +

Can we ever hit this if get_power_status is non-null?

> return snprintf(buf, 40, "%s\n", ecomp->power_status ? "on" : "off");
>  }
>
> diff --git

[lpfc 10/19] Remove hba lock from NVMET isssue WQE

2017-04-04 Thread Dick Kennedy

From: Dick Kennedy 

Unecessary lock is taken. ring lock should be sufficient

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_nvmet.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c
index 86ab95c..551e81a 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.c
+++ b/drivers/scsi/lpfc/lpfc_nvmet.c
@@ -519,7 +519,6 @@ lpfc_nvmet_xmt_fcp_op(struct nvmet_fc_target_port *tgtport,
container_of(rsp, struct lpfc_nvmet_rcv_ctx, ctx.fcp_req);
struct lpfc_hba *phba = ctxp->phba;
struct lpfc_iocbq *nvmewqeq;
-   unsigned long iflags;
int rc, id;
 
 #ifdef CONFIG_SCSI_LPFC_DEBUG_FS
@@ -594,10 +593,7 @@ lpfc_nvmet_xmt_fcp_op(struct nvmet_fc_target_port *tgtport,
lpfc_nvmeio_data(phba, "NVMET FCP CMND: xri x%x op x%x len x%x\n",
 ctxp->oxid, rsp->op, rsp->rsplen);
 
-   /* For now we take hbalock */
-   spin_lock_irqsave(>hbalock, iflags);
rc = lpfc_sli4_issue_wqe(phba, LPFC_FCP_RING, nvmewqeq);
-   spin_unlock_irqrestore(>hbalock, iflags);
if (rc == WQE_SUCCESS) {
ctxp->flag |= LPFC_NVMET_IO_INP;
 #ifdef CONFIG_SCSI_LPFC_DEBUG_FS
-- 
2.1.0

[lpfc 14/19] Fix crash after issuing lip reset

2017-04-04 Thread Dick Kennedy

From: Dick Kennedy 

When RPI is not available, driver sends WQE with invalid RPI value and hence rej
ected by HBA.
lpfc :82:00.3: 1:3154 BLS ABORT RSP failed, data:  x3/xa0320008
and
lpfc :82:00.3: 1:(71):2753 PLOGI failure DID:FA Status:x3/xa0240008

In this case, driver accesses rpi_ids array out of bounds.

Fix:
Check return value of lpfc_sli4_alloc_rpi(). Do not allocate lpfc_nodelist entry
 if RPI is not available.

Notes:
When RPI is not available, we will get discovery timeouts and command drops for
some of the vports as seen below.

2017-03-22T11:26:50.632070+05:30 linux-dput kernel: [58101.904888] lpfc :0b:
00.2: 0:(106):0273 Unexpected discovery timeout, vport State x0
2017-03-22T11:26:50.632089+05:30 linux-dput kernel: [58101.904895] lpfc :0b:
00.2: 0:(106):0230 Unexpected timeout, hba link state x5

2017-03-22T11:27:25.024065+05:30 linux-dput kernel: [58136.281620] lpfc :0b:
00.2: 0:(21):0111 Dropping received ELS cmd Data: x0 xc90c55 x0

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_bsg.c |  4 
 drivers/scsi/lpfc/lpfc_crtn.h|  2 +-
 drivers/scsi/lpfc/lpfc_els.c | 35 +-
 drivers/scsi/lpfc/lpfc_hbadisc.c | 47 ++--
 drivers/scsi/lpfc/lpfc_init.c| 20 -
 drivers/scsi/lpfc/lpfc_sli.c |  3 +--
 drivers/scsi/lpfc/lpfc_vport.c   |  3 +--
 7 files changed, 68 insertions(+), 46 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_bsg.c b/drivers/scsi/lpfc/lpfc_bsg.c
index 18157d2..a1686c2 100644
--- a/drivers/scsi/lpfc/lpfc_bsg.c
+++ b/drivers/scsi/lpfc/lpfc_bsg.c
@@ -2486,6 +2486,10 @@ static int lpfcdiag_loop_self_reg(struct lpfc_hba *phba, 
uint16_t *rpi)
mbox, *rpi);
else {
*rpi = lpfc_sli4_alloc_rpi(phba);
+   if (*rpi == LPFC_RPI_ALLOC_ERROR) {
+   mempool_free(mbox, phba->mbox_mem_pool);
+   return -EBUSY;
+   }
status = lpfc_reg_rpi(phba, phba->pport->vpi,
phba->pport->fc_myDID,
(uint8_t *)>pport->fc_sparam,
diff --git a/drivers/scsi/lpfc/lpfc_crtn.h b/drivers/scsi/lpfc/lpfc_crtn.h
index 54e6ac4..d859aff 100644
--- a/drivers/scsi/lpfc/lpfc_crtn.h
+++ b/drivers/scsi/lpfc/lpfc_crtn.h
@@ -99,7 +99,7 @@ void lpfc_issue_reg_vpi(struct lpfc_hba *, struct lpfc_vport 
*);
 
 int lpfc_check_sli_ndlp(struct lpfc_hba *, struct lpfc_sli_ring *,
struct lpfc_iocbq *, struct lpfc_nodelist *);
-void lpfc_nlp_init(struct lpfc_vport *, struct lpfc_nodelist *, uint32_t);
+struct lpfc_nodelist *lpfc_nlp_init(struct lpfc_vport *, uint32_t);
 struct lpfc_nodelist *lpfc_nlp_get(struct lpfc_nodelist *);
 int  lpfc_nlp_put(struct lpfc_nodelist *);
 int  lpfc_nlp_not_used(struct lpfc_nodelist *ndlp);
diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index d6e0f58..ffdcac0 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -895,10 +895,9 @@ lpfc_cmpl_els_flogi_nport(struct lpfc_vport *vport, struct 
lpfc_nodelist *ndlp,
 * Cannot find existing Fabric ndlp, so allocate a
 * new one
 */
-   ndlp = mempool_alloc(phba->nlp_mem_pool, GFP_KERNEL);
+   ndlp = lpfc_nlp_init(vport, PT2PT_RemoteID);
if (!ndlp)
goto fail;
-   lpfc_nlp_init(vport, ndlp, PT2PT_RemoteID);
} else if (!NLP_CHK_NODE_ACT(ndlp)) {
ndlp = lpfc_enable_node(vport, ndlp,
NLP_STE_UNUSED_NODE);
@@ -1364,7 +1363,6 @@ lpfc_els_abort_flogi(struct lpfc_hba *phba)
 int
 lpfc_initial_flogi(struct lpfc_vport *vport)
 {
-   struct lpfc_hba *phba = vport->phba;
struct lpfc_nodelist *ndlp;
 
vport->port_state = LPFC_FLOGI;
@@ -1374,10 +1372,9 @@ lpfc_initial_flogi(struct lpfc_vport *vport)
ndlp = lpfc_findnode_did(vport, Fabric_DID);
if (!ndlp) {
/* Cannot find existing Fabric ndlp, so allocate a new one */
-   ndlp = mempool_alloc(phba->nlp_mem_pool, GFP_KERNEL);
+   ndlp = lpfc_nlp_init(vport, Fabric_DID);
if (!ndlp)
return 0;
-   lpfc_nlp_init(vport, ndlp, Fabric_DID);
/* Set the node type */
ndlp->nlp_type |= NLP_FABRIC;
/* Put ndlp onto node list */
@@ -1418,17 +1415,15 @@ lpfc_initial_flogi(struct lpfc_vport *vport)
 int
 lpfc_initial_fdisc(struct lpfc_vport *vport)
 {
-   struct lpfc_hba *phba = vport->phba;
struct lpfc_nodelist *ndlp;
 
/* First look for the Fabric ndlp */

[lpfc 07/19] Fix driver usage of 128B WQEs when WQ_CREATE is V1.

2017-04-04 Thread Dick Kennedy

From: Dick Kennedy 

Verify WQ_CREATE with 128B WQEs in V0 and V1.

Code review of another bug showed the driver passing
128B WQEs and 8 pages in WQ CREATE and V0.
Code inspection/instrumentation showed that the driver
uses V0 in WQ_CREATE and if the caller passes queue->entry_size
128B, the driver sets the hdr_version to V1 so all is good.
When I tested the V1 WQ_CREATE, the mailbox failed causing
the driver to unload.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_sli.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index d606944..1519fdf 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -14741,6 +14741,9 @@ lpfc_wq_create(struct lpfc_hba *phba, struct lpfc_queue 
*wq,
case LPFC_Q_CREATE_VERSION_1:
bf_set(lpfc_mbx_wq_create_wqe_count, _create->u.request_1,
   wq->entry_count);
+   bf_set(lpfc_mbox_hdr_version, >request,
+  LPFC_Q_CREATE_VERSION_1);
+
switch (wq->entry_size) {
default:
case 64:
-- 
2.1.0

[lpfc 05/19] Fix driver unload/reload operation.

2017-04-04 Thread Dick Kennedy

From: Dick Kennedy 

Two issues: (1) driver could not be unloaded and
reloaded without some Oops or Panic occurring. (2) The
driver was panicking because of a corruption in the Memory
Manager when the iocb list was getting allocated.

Root cause for the memory corruption was a double
free of the Work Queue ring pointer memory - Freed once
in the lpfc_sli4_queue_free when the CQ was destroyed
and again in lpfc_sli4_queue_free when the WQ was
destroyed.  While the linux memory manager protects
against NULL pointers, it can't protect against stale
pointer kfree calls.

There are other fixes in this patch found during testing.
While they are not a direct corruption cause, they should
be fixed to ensure consistent unload/reload.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_init.c | 46 +++
 drivers/scsi/lpfc/lpfc_sli.c  |  7 ++-
 2 files changed, 31 insertions(+), 22 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index b56da01..cca7f81 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -5815,6 +5815,12 @@ lpfc_sli4_driver_resource_setup(struct lpfc_hba *phba)
INIT_LIST_HEAD(>sli4_hba.lpfc_vfi_blk_list);
INIT_LIST_HEAD(>lpfc_vpi_blk_list);
 
+   /* Initialize mboxq lists. If the early init routines fail
+* these lists need to be correctly initialized.
+*/
+   INIT_LIST_HEAD(>sli.mboxq);
+   INIT_LIST_HEAD(>sli.mboxq_cmpl);
+
/* initialize optic_state to 0xFF */
phba->sli4_hba.lnk_info.optic_state = 0xff;
 
@@ -5880,6 +5886,7 @@ lpfc_sli4_driver_resource_setup(struct lpfc_hba *phba)
"READ_NV, mbxStatus x%x\n",
bf_get(lpfc_mqe_command, >u.mqe),
bf_get(lpfc_mqe_status, >u.mqe));
+   mempool_free(mboxq, phba->mbox_mem_pool);
rc = -EIO;
goto out_free_bsmbx;
}
@@ -7805,7 +7812,7 @@ lpfc_alloc_fcp_wq_cq(struct lpfc_hba *phba, int wqidx)
 
/* Create Fast Path FCP WQs */
wqesize = (phba->fcp_embed_io) ?
-   LPFC_WQE128_SIZE : phba->sli4_hba.wq_esize;
+   LPFC_WQE128_SIZE : phba->sli4_hba.wq_esize;
qdesc = lpfc_sli4_queue_alloc(phba, wqesize, phba->sli4_hba.wq_ecount);
if (!qdesc) {
lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
@@ -7836,7 +7843,7 @@ int
 lpfc_sli4_queue_create(struct lpfc_hba *phba)
 {
struct lpfc_queue *qdesc;
-   int idx, io_channel, max;
+   int idx, io_channel;
 
/*
 * Create HBA Record arrays.
@@ -7997,15 +8004,6 @@ lpfc_sli4_queue_create(struct lpfc_hba *phba)
if (lpfc_alloc_nvme_wq_cq(phba, idx))
goto out_error;
 
-   /* allocate MRQ CQs */
-   max = phba->cfg_nvme_io_channel;
-   if (max < phba->cfg_nvmet_mrq)
-   max = phba->cfg_nvmet_mrq;
-
-   for (idx = 0; idx < max; idx++)
-   if (lpfc_alloc_nvme_wq_cq(phba, idx))
-   goto out_error;
-
if (phba->nvmet_support) {
for (idx = 0; idx < phba->cfg_nvmet_mrq; idx++) {
qdesc = lpfc_sli4_queue_alloc(phba,
@@ -8227,11 +8225,11 @@ lpfc_sli4_queue_destroy(struct lpfc_hba *phba)
 
/* Release FCP cqs */
lpfc_sli4_release_queues(>sli4_hba.fcp_cq,
-   phba->cfg_fcp_io_channel);
+phba->cfg_fcp_io_channel);
 
/* Release FCP wqs */
lpfc_sli4_release_queues(>sli4_hba.fcp_wq,
-   phba->cfg_fcp_io_channel);
+phba->cfg_fcp_io_channel);
 
/* Release FCP CQ mapping array */
lpfc_sli4_release_queue_map(>sli4_hba.fcp_cq_map);
@@ -8577,15 +8575,15 @@ lpfc_sli4_queue_setup(struct lpfc_hba *phba)
lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
"0528 %s not allocated\n",
phba->sli4_hba.mbx_cq ?
-   "Mailbox WQ" : "Mailbox CQ");
+   "Mailbox WQ" : "Mailbox CQ");
rc = -ENOMEM;
goto out_destroy;
}
 
rc = lpfc_create_wq_cq(phba, phba->sli4_hba.hba_eq[0],
-   phba->sli4_hba.mbx_cq,
-   phba->sli4_hba.mbx_wq,
-   NULL, 0, LPFC_MBOX);
+  phba->sli4_hba.mbx_cq,
+  phba->sli4_hba.mbx_wq,
+  NULL, 0, LPFC_MBOX);
if (rc) {

[lpfc 01/19] Fix advertised max_sgl_segment count for NVMET

2017-04-04 Thread Dick Kennedy

From: Dick Kennedy 

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_nvme.h  | 4 +---
 drivers/scsi/lpfc/lpfc_nvmet.c | 2 +-
 drivers/scsi/lpfc/lpfc_nvmet.h | 4 +---
 3 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvme.h b/drivers/scsi/lpfc/lpfc_nvme.h
index 1347deb..6277796 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.h
+++ b/drivers/scsi/lpfc/lpfc_nvme.h
@@ -21,9 +21,7 @@
  * included with this package. *
  /
 
-#define LPFC_NVME_MIN_SEGS 16
-#define LPFC_NVME_DEFAULT_SEGS 66  /* 256K IOs - 64 + 2 */
-#define LPFC_NVME_MAX_SEGS 510
+#define LPFC_NVME_DEFAULT_SEGS (64 + 1)/* 256K IOs */
 #define LPFC_NVMET_MIN_POSTBUF 16
 #define LPFC_NVMET_DEFAULT_POSTBUF 1024
 #define LPFC_NVMET_MAX_POSTBUF 4096
diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c
index 7ca868f..86ab95c 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.c
+++ b/drivers/scsi/lpfc/lpfc_nvmet.c
@@ -667,7 +667,7 @@ lpfc_nvmet_create_targetport(struct lpfc_hba *phba)
pinfo.port_id = vport->fc_myDID;
 
lpfc_tgttemplate.max_hw_queues = phba->cfg_nvme_io_channel;
-   lpfc_tgttemplate.max_sgl_segments = phba->cfg_sg_seg_cnt;
+   lpfc_tgttemplate.max_sgl_segments = phba->cfg_sg_seg_cnt + 1;
lpfc_tgttemplate.target_features = NVMET_FCTGTFEAT_READDATA_RSP |
   NVMET_FCTGTFEAT_NEEDS_CMD_CPUSCHED;
 
diff --git a/drivers/scsi/lpfc/lpfc_nvmet.h b/drivers/scsi/lpfc/lpfc_nvmet.h
index ca96f05..0aa202c 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.h
+++ b/drivers/scsi/lpfc/lpfc_nvmet.h
@@ -21,9 +21,7 @@
  * included with this package. *
  /
 
-#define LPFC_NVMET_MIN_SEGS16
-#define LPFC_NVMET_DEFAULT_SEGS64  /* 256K IOs */
-#define LPFC_NVMET_MAX_SEGS510
+#define LPFC_NVMET_DEFAULT_SEGS(64 + 1)/* 256K IOs */
 #define LPFC_NVMET_SUCCESS_LEN 12
 
 /* Used for NVME Target */
-- 
2.1.0

[lpfc 17/19] Fix implicit logo and RSCN handling for NVMET

2017-04-04 Thread Dick Kennedy

From: Dick Kennedy 

Symptoms: NVMET didn't have any RSCN handling at all and
would not execute implicit LOGO when receiving a PLOGI
from an rport that NVMET had in state UNMAPPED.

Clean up the logic in lpfc_nlp_state_cleanup for
initiators (FCP and NVME). NVMET should not respond to
RSCN including allocating new ndlps so this code was
conditionalized when nvmet_support is true.  The check
for NLP_RCV_PLOGI in lpfc_setup_disc_node was moved
below the check for nvmet_support to allow the NVMET
to recover initiator nodes correctly.  The implicit
logo was introduced with lpfc_rcv_plogi when NVMET gets
a PLOGI on an ndlp in UNMAPPED state.  The RSCN handling
was modified to not respond to an RSCN in NVMET.  Instead
NVMET sends a GID_FT and determines if an NVMEP_INITIATOR
it has is UNMAPPED but no longer in the zone membership.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_ct.c| 68 +-
 drivers/scsi/lpfc/lpfc_disc.h  |  1 +
 drivers/scsi/lpfc/lpfc_els.c   | 23 ++---
 drivers/scsi/lpfc/lpfc_hbadisc.c   | 62 ++
 drivers/scsi/lpfc/lpfc_nportdisc.c |  8 +++--
 5 files changed, 110 insertions(+), 52 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_ct.c b/drivers/scsi/lpfc/lpfc_ct.c
index d3e9af9..7e6a38f 100644
--- a/drivers/scsi/lpfc/lpfc_ct.c
+++ b/drivers/scsi/lpfc/lpfc_ct.c
@@ -537,19 +537,53 @@ lpfc_prep_node_fc4type(struct lpfc_vport *vport, uint32_t 
Did, uint8_t fc4_type)
}
 }
 
+static void
+lpfc_ns_rsp_audit_did(struct lpfc_vport *vport, uint32_t Did, uint8_t fc4_type)
+{
+   struct lpfc_hba *phba = vport->phba;
+   struct lpfc_nodelist *ndlp = NULL;
+   struct Scsi_Host *shost = lpfc_shost_from_vport(vport);
+
+   /*
+* To conserve rpi's, filter out addresses for other
+* vports on the same physical HBAs.
+*/
+   if (Did != vport->fc_myDID &&
+   (!lpfc_find_vport_by_did(phba, Did) ||
+vport->cfg_peer_port_login)) {
+   if (!phba->nvmet_support) {
+   /* FCPI/NVMEI path. Process Did */
+   lpfc_prep_node_fc4type(vport, Did, fc4_type);
+   } else {
+   /* NVMET path.  NVMET only cares about NVMEI nodes. */
+   list_for_each_entry(ndlp, >fc_nodes, nlp_listp) {
+   if (ndlp->nlp_type != NLP_NVME_INITIATOR ||
+   ndlp->nlp_state != NLP_STE_UNMAPPED_NODE)
+   continue;
+   spin_lock_irq(shost->host_lock);
+   if (ndlp->nlp_DID == Did)
+   ndlp->nlp_flag &= ~NLP_NVMET_RECOV;
+   else
+   ndlp->nlp_flag |= NLP_NVMET_RECOV;
+   spin_unlock_irq(shost->host_lock);
+   }
+   }
+   }
+}
+
 static int
 lpfc_ns_rsp(struct lpfc_vport *vport, struct lpfc_dmabuf *mp, uint8_t fc4_type,
uint32_t Size)
 {
-   struct lpfc_hba  *phba = vport->phba;
struct lpfc_sli_ct_request *Response =
(struct lpfc_sli_ct_request *) mp->virt;
-   struct lpfc_nodelist *ndlp = NULL;
struct lpfc_dmabuf *mlast, *next_mp;
uint32_t *ctptr = (uint32_t *) & Response->un.gid.PortType;
uint32_t Did, CTentry;
int Cnt;
struct list_head head;
+   struct Scsi_Host *shost = lpfc_shost_from_vport(vport);
+   struct lpfc_nodelist *ndlp = NULL;
 
lpfc_set_disctmo(vport);
vport->num_disc_nodes = 0;
@@ -574,19 +608,7 @@ lpfc_ns_rsp(struct lpfc_vport *vport, struct lpfc_dmabuf 
*mp, uint8_t fc4_type,
/* Get next DID from NameServer List */
CTentry = *ctptr++;
Did = ((be32_to_cpu(CTentry)) & Mask_DID);
-
-   ndlp = NULL;
-
-   /*
-* Check for rscn processing or not
-* To conserve rpi's, filter out addresses for other
-* vports on the same physical HBAs.
-*/
-   if ((Did != vport->fc_myDID) &&
-   ((lpfc_find_vport_by_did(phba, Did) == NULL) ||
-vport->cfg_peer_port_login))
-   lpfc_prep_node_fc4type(vport, Did, fc4_type);
-
+   lpfc_ns_rsp_audit_did(vport, Did, fc4_type);
if (CTentry & (cpu_to_be32(SLI_CT_LAST_ENTRY)))
goto nsout1;
 
@@ -596,6 +618,22 @@ lpfc_ns_rsp(struct lpfc_vport *vport, struct lpfc_dmabuf 
*mp, uint8_t fc4_type,
 
}
 
+   /* All GID_FT entries

[lpfc 13/19] Fix driver load issues when MRQ=8

2017-04-04 Thread Dick Kennedy

From: Dick Kennedy 

The symptom is that the driver will fail to login to the fabric. The reason is 
bacause it is out of iocb resources.
This fix algins the number of MRQ resources with the total resources so that it 
can handle fabric events when needed.
Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_init.c | 2 +-
 drivers/scsi/lpfc/lpfc_nvme.c | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 88977b8..12c7e9a 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -10280,7 +10280,7 @@ lpfc_pci_probe_one_s3(struct pci_dev *pdev, const 
struct pci_device_id *pid)
struct lpfc_hba   *phba;
struct lpfc_vport *vport = NULL;
struct Scsi_Host  *shost = NULL;
-   int error;
+   int error, cnt;
uint32_t cfg_mode, intr_mode;
 
/* Allocate memory for HBA structure */
diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index 9468679..5145ef2 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -1480,6 +1480,7 @@ lpfc_nvme_fcp_abort(struct nvme_fc_local_port 
*pnvme_lport,
 phba->hba_flag);
return;
}
+   nvmereq_wqe = _nbuf->cur_iocbq;
 
lpfc_nbuf = (struct lpfc_nvme_buf *)pnvme_fcreq->private;
if (!lpfc_nbuf) {
-- 
2.1.0

[lpfc 09/19] Fix Kconfig defines.

2017-04-04 Thread Dick Kennedy

From: Dick Kennedy 

One of the idefs in the previous kconfig patch was not corrected.
Also one alingment issue fixed.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_attr.c | 2 +-
 drivers/scsi/lpfc/lpfc_nvme.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index 9e66cb0..0e28c2e 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -3335,7 +3335,7 @@ LPFC_ATTR_R(enable_fc4_type, LPFC_ENABLE_FCP,
  * percentage will go to NVME.
  */
 LPFC_ATTR_R(xri_split, 50, 10, 90,
-"Division of XRI resources between SCSI and NVME");
+   "Division of XRI resources between SCSI and NVME");
 
 /*
 # lpfc_log_verbose: Only turn this flag on if you are willing to risk being
diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index 9726c95..9d39093 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -2264,7 +2264,7 @@ lpfc_nvme_destroy_localport(struct lpfc_vport *vport)
 void
 lpfc_nvme_update_localport(struct lpfc_vport *vport)
 {
-#ifdef CONFIG_LPFC_NVME_INITIATOR
+#if (IS_ENABLED(CONFIG_NVME_FC))
struct nvme_fc_local_port *localport;
struct lpfc_nvme_lport *lport;
 
-- 
2.1.0

[lpfc 06/19] Fix discovery error handling in NVME initiator

2017-04-04 Thread Dick Kennedy

From: Dick Kennedy 

performance lab running tests on upstream 4.10-rc5
POC and observing error during the NVME connect process.  Here
are the errors:

Issue 1:
The initiator is claiming the nvme_fc_unregister_remoteport upcall is
not completing the unregister in the time allotted.
[ 2186.151317] lpfc :07:00.0: 0:(0):6169 Unreg nvme wait failed 0

Issue 2:
In this case, the NVME initiator is sending an LS REQ command on an NDLP
that is not MAPPED.  The FW rejects it.
[ 2186.165689] lpfc :07:00.0: 0:(0):6051 ENTER.  lport 8817c75f7f00, 
rport 8817d2ab3bc0 lsreq8817cdad1900 rqstlen:48 rsplen:24 102234921368x 
102234921416
[ 2186.165691] lpfc :07:00.0: 0:(0):6050 Issue GEN REQ WQE to NPORT x560100 
Data: x7a5 x20 wq:8817c7f5ce00 lsreq:8817cdad1900 bmp:8817c7492400 
xmit:48 1st:48
[ 2186.165700] lpfc :07:00.0: 0:(0):6047 nvme cmpl Enter Data 
8817cdad1900 DID 560100 Xri: 3d status 3 cmd:8817c7f5ce00 
lsreg:8817cdad1900 bmp:8817c7492400 ndlp:8817c8c4b600

Issue 3:
There is a panic in the lpfc_nvme_io_cmd_wqe_cmpl routine whenever the NVME IO 
logging is enabled.
This is because the initiator is touching the NDLP post unregister.

Cause: Summarized:
Issue 1:  The wait_for_completion_timeout returns 0 when the wait has
been outstanding for the jiffies passed by the caller.  In this error
message, the nvme initiator passed value 5 - meaning 5 jiffies -
and this is just wrong.

Issue 2: The lpfc_nvme_ls_req routine checks for a NULL ndlp pointer
but does not check the NDLP state.  This allows the routine
to send an LS IO when the ndlp is disconnected.

Issue 3: The driver checks for NULL pointers but the log message
in the driver references the null ndlp anyway.

Fix: Summarized:
Issue 1:  Calculate 5 seconds in Jiffies and pass that value
from the current jiffies.

Issue 2: Check the ndlp for NULL, actual node, Target and MAPPED
or Initiator and UNMAPPED. This avoids Fabric nodes getting
the Create Association or Create Connection commands.  Initiators
are free to Reject either Create.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_nvme.c | 37 +++--
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index 0024de1..6465aa6 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -417,11 +417,26 @@ lpfc_nvme_ls_req(struct nvme_fc_local_port *pnvme_lport,
vport = lport->vport;
 
ndlp = lpfc_findnode_did(vport, pnvme_rport->port_id);
-   if (!ndlp) {
-   lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME_DISC,
-"6043 Could not find node for DID %x\n",
+   if (!ndlp || !NLP_CHK_NODE_ACT(ndlp)) {
+   lpfc_printf_vlog(vport, KERN_ERR, LOG_NODE | LOG_NVME_IOERR,
+"6051 DID x%06x not an active rport.\n",
 pnvme_rport->port_id);
-   return 1;
+   return -ENODEV;
+   }
+
+   /* The remote node has to be a mapped nvme target or an
+* unmapped nvme initiator or it's an error.
+*/
+   if (((ndlp->nlp_type & NLP_NVME_TARGET) &&
+(ndlp->nlp_state != NLP_STE_MAPPED_NODE)) ||
+   ((ndlp->nlp_type & NLP_NVME_INITIATOR) &&
+(ndlp->nlp_state != NLP_STE_UNMAPPED_NODE))) {
+   lpfc_printf_vlog(vport, KERN_ERR, LOG_NODE | LOG_NVME_IOERR,
+"6088 DID x%06x not ready for "
+"IO. State x%x, Type x%x\n",
+pnvme_rport->port_id,
+ndlp->nlp_state, ndlp->nlp_type);
+   return -ENODEV;
}
bmp = kmalloc(sizeof(struct lpfc_dmabuf), GFP_KERNEL);
if (!bmp) {
@@ -456,7 +471,7 @@ lpfc_nvme_ls_req(struct nvme_fc_local_port *pnvme_lport,
 
/* Expand print to include key fields. */
lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_DISC,
-"6051 ENTER.  lport %p, rport %p lsreq%p rqstlen:%d "
+"6149 ENTER.  lport %p, rport %p lsreq%p rqstlen:%d "
 "rsplen:%d %pad %pad\n",
 pnvme_lport, pnvme_rport,
 pnvme_lsreq, pnvme_lsreq->rqstlen,
@@ -772,9 +787,9 @@ lpfc_nvme_io_cmd_wqe_cmpl(struct lpfc_hba *phba, struct 
lpfc_iocbq *pwqeIn,
ndlp = rport->ndlp;
if (!ndlp || !NLP_CHK_NODE_ACT(ndlp)) {
lpfc_printf_vlog(vport, KERN_ERR, LOG_NODE | LOG_NVME_IOERR,
-"6061 rport %p, ndlp %p, DID x%06x ndlp "
+"6061 rport %p, DID x%06x node "
 "not ready.\n",
-rport, ndlp,

[lpfc 18/19] Update ABORT processing for NVMET

2017-04-04 Thread Dick Kennedy

From: Dick Kennedy 

Missing code path, new NVME abort API

The first lpfc driver with nvme had this routine stubbed.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_crtn.h|   7 +
 drivers/scsi/lpfc/lpfc_debugfs.c |  55 +--
 drivers/scsi/lpfc/lpfc_hw4.h |   3 +
 drivers/scsi/lpfc/lpfc_init.c|  52 +++---
 drivers/scsi/lpfc/lpfc_mbox.c|   7 +-
 drivers/scsi/lpfc/lpfc_nvme.c|  45 +++--
 drivers/scsi/lpfc/lpfc_nvmet.c   | 348 +++
 drivers/scsi/lpfc/lpfc_nvmet.h   |  10 +-
 drivers/scsi/lpfc/lpfc_sli.c |   7 +-
 drivers/scsi/lpfc/lpfc_sli4.h|   2 +-
 10 files changed, 411 insertions(+), 125 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_crtn.h b/drivers/scsi/lpfc/lpfc_crtn.h
index d859aff..24da922 100644
--- a/drivers/scsi/lpfc/lpfc_crtn.h
+++ b/drivers/scsi/lpfc/lpfc_crtn.h
@@ -24,6 +24,7 @@ typedef int (*node_filter)(struct lpfc_nodelist *, void *);
 
 struct fc_rport;
 struct fc_frame_header;
+struct lpfc_nvmet_rcv_ctx;
 void lpfc_down_link(struct lpfc_hba *, LPFC_MBOXQ_t *);
 void lpfc_sli_read_link_ste(struct lpfc_hba *);
 void lpfc_dump_mem(struct lpfc_hba *, LPFC_MBOXQ_t *, uint16_t, uint16_t);
@@ -245,6 +246,10 @@ struct hbq_dmabuf *lpfc_sli4_rb_alloc(struct lpfc_hba *);
 void lpfc_sli4_rb_free(struct lpfc_hba *, struct hbq_dmabuf *);
 struct rqb_dmabuf *lpfc_sli4_nvmet_alloc(struct lpfc_hba *phba);
 void lpfc_sli4_nvmet_free(struct lpfc_hba *phba, struct rqb_dmabuf *dmab);
+void lpfc_nvmet_rq_post(struct lpfc_hba *phba, struct lpfc_nvmet_rcv_ctx *ctxp,
+   struct lpfc_dmabuf *mp);
+int lpfc_nvmet_rcv_unsol_abort(struct lpfc_vport *vport,
+  struct fc_frame_header *fc_hdr);
 void lpfc_sli4_build_dflt_fcf_record(struct lpfc_hba *, struct fcf_record *,
uint16_t);
 int lpfc_sli4_rq_put(struct lpfc_queue *hq, struct lpfc_queue *dq,
@@ -302,6 +307,8 @@ int lpfc_sli_check_eratt(struct lpfc_hba *);
 void lpfc_sli_handle_slow_ring_event(struct lpfc_hba *,
struct lpfc_sli_ring *, uint32_t);
 void lpfc_sli4_handle_received_buffer(struct lpfc_hba *, struct hbq_dmabuf *);
+void lpfc_sli4_seq_abort_rsp(struct lpfc_vport *vport,
+struct fc_frame_header *fc_hdr, bool aborted);
 void lpfc_sli_def_mbox_cmpl(struct lpfc_hba *, LPFC_MBOXQ_t *);
 void lpfc_sli4_unreg_rpi_cmpl_clr(struct lpfc_hba *, LPFC_MBOXQ_t *);
 int lpfc_sli_issue_iocb(struct lpfc_hba *, uint32_t,
diff --git a/drivers/scsi/lpfc/lpfc_debugfs.c b/drivers/scsi/lpfc/lpfc_debugfs.c
index 55a8d8f..67efa68 100644
--- a/drivers/scsi/lpfc/lpfc_debugfs.c
+++ b/drivers/scsi/lpfc/lpfc_debugfs.c
@@ -745,73 +745,104 @@ lpfc_debugfs_nvmestat_data(struct lpfc_vport *vport, 
char *buf, int size)
 {
struct lpfc_hba   *phba = vport->phba;
struct lpfc_nvmet_tgtport *tgtp;
+   struct lpfc_nvmet_rcv_ctx *ctxp, *next_ctxp;
int len = 0;
+   int cnt;
 
if (phba->nvmet_support) {
if (!phba->targetport)
return len;
tgtp = (struct lpfc_nvmet_tgtport *)phba->targetport->private;
-   len += snprintf(buf+len, size-len,
+   len += snprintf(buf + len, size - len,
"\nNVME Targetport Statistics\n");
 
-   len += snprintf(buf+len, size-len,
+   len += snprintf(buf + len, size - len,
"LS: Rcv %08x Drop %08x Abort %08x\n",
atomic_read(>rcv_ls_req_in),
atomic_read(>rcv_ls_req_drop),
atomic_read(>xmt_ls_abort));
if (atomic_read(>rcv_ls_req_in) !=
atomic_read(>rcv_ls_req_out)) {
-   len += snprintf(buf+len, size-len,
+   len += snprintf(buf + len, size - len,
"Rcv LS: in %08x != out %08x\n",
atomic_read(>rcv_ls_req_in),
atomic_read(>rcv_ls_req_out));
}
 
-   len += snprintf(buf+len, size-len,
+   len += snprintf(buf + len, size - len,
"LS: Xmt %08x Drop %08x Cmpl %08x Err %08x\n",
atomic_read(>xmt_ls_rsp),
atomic_read(>xmt_ls_drop),
atomic_read(>xmt_ls_rsp_cmpl),
atomic_read(>xmt_ls_rsp_error));
 
-   len += snprintf(buf+len, size-len,
+   len += snprintf(buf + len, size - len,
"FCP: Rcv %08x Drop %08x\n",
atomic_read(>rcv_fcp_cmd_in),

[lpfc 02/19] Fixes after reviewing last set of patches. Minor fixes to previous patches

2017-04-04 Thread Dick Kennedy

From: Dick Kennedy 

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_attr.c|  6 +-
 drivers/scsi/lpfc/lpfc_debugfs.c | 14 --
 drivers/scsi/lpfc/lpfc_init.c|  6 ++
 drivers/scsi/lpfc/lpfc_nvme.h|  3 ---
 4 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index 22819af..9e66cb0 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -4458,7 +4458,11 @@ lpfc_fcp_imax_store(struct device *dev, struct 
device_attribute *attr,
 
phba->cfg_fcp_imax = (uint32_t)val;
 
-   for (i = 0; i < phba->io_channel_irqs; i += LPFC_MAX_EQ_DELAY_EQID_CNT)
+   /*
+* Configure EQ delay multipier for interrupt coalescing using
+* MODIFY_EQ_DELAY for all EQs created, LPFC_MAX_EQ_DELAY at a time.
+*/
+   for (i = 0; i < phba->io_channel_irqs; i += LPFC_MAX_EQ_DELAY)
lpfc_modify_hba_eq_delay(phba, i);
 
return strlen(buf);
diff --git a/drivers/scsi/lpfc/lpfc_debugfs.c b/drivers/scsi/lpfc/lpfc_debugfs.c
index 913eed8..55a8d8f 100644
--- a/drivers/scsi/lpfc/lpfc_debugfs.c
+++ b/drivers/scsi/lpfc/lpfc_debugfs.c
@@ -3128,8 +3128,6 @@ __lpfc_idiag_print_rqpair(struct lpfc_queue *qp, struct 
lpfc_queue *datqp,
datqp->queue_id, datqp->entry_count,
datqp->entry_size, datqp->host_index,
datqp->hba_index);
-   len +=  snprintf(pbuffer + len, LPFC_QUE_INFO_GET_BUF_SIZE - len, "\n");
-
return len;
 }
 
@@ -5700,10 +5698,8 @@ lpfc_debugfs_terminate(struct lpfc_vport *vport)
 #ifdef CONFIG_SCSI_LPFC_DEBUG_FS
struct lpfc_hba   *phba = vport->phba;
 
-   if (vport->disc_trc) {
-   kfree(vport->disc_trc);
-   vport->disc_trc = NULL;
-   }
+   kfree(vport->disc_trc);
+   vport->disc_trc = NULL;
 
debugfs_remove(vport->debug_disc_trc); /* discovery_trace */
vport->debug_disc_trc = NULL;
@@ -5770,10 +5766,8 @@ lpfc_debugfs_terminate(struct lpfc_vport *vport)
debugfs_remove(phba->debug_readRef); /* readRef */
phba->debug_readRef = NULL;
 
-   if (phba->slow_ring_trc) {
-   kfree(phba->slow_ring_trc);
-   phba->slow_ring_trc = NULL;
-   }
+   kfree(phba->slow_ring_trc);
+   phba->slow_ring_trc = NULL;
 
/* slow_ring_trace */
debugfs_remove(phba->debug_slow_ring_trc);
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 6cc561b..b56da01 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -3508,6 +3508,12 @@ lpfc_sli4_scsi_sgl_update(struct lpfc_hba *phba)
spin_unlock(>scsi_buf_list_put_lock);
spin_unlock_irq(>scsi_buf_list_get_lock);
 
+   lpfc_printf_log(phba, KERN_INFO, LOG_SLI,
+   "6060 Current allocated SCSI xri-sgl count:%d, "
+   "maximum  SCSI xri count:%d (split:%d)\n",
+   phba->sli4_hba.scsi_xri_cnt,
+   phba->sli4_hba.scsi_xri_max, phba->cfg_xri_split);
+
if (phba->sli4_hba.scsi_xri_cnt > phba->sli4_hba.scsi_xri_max) {
/* max scsi xri shrinked below the allocated scsi buffers */
scsi_xri_cnt = phba->sli4_hba.scsi_xri_cnt -
diff --git a/drivers/scsi/lpfc/lpfc_nvme.h b/drivers/scsi/lpfc/lpfc_nvme.h
index 6277796..2582f46 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.h
+++ b/drivers/scsi/lpfc/lpfc_nvme.h
@@ -22,9 +22,6 @@
  /
 
 #define LPFC_NVME_DEFAULT_SEGS (64 + 1)/* 256K IOs */
-#define LPFC_NVMET_MIN_POSTBUF 16
-#define LPFC_NVMET_DEFAULT_POSTBUF 1024
-#define LPFC_NVMET_MAX_POSTBUF 4096
 #define LPFC_NVME_WQSIZE   256
 
 #define LPFC_NVME_ERSP_LEN 0x20
-- 
2.1.0

[lpfc 03/19] Fix spelling in comments.

2017-04-04 Thread Dick Kennedy

From: Dick Kennedy 

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_sli.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 1c9fa45..7087c55 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -6338,7 +6338,7 @@ lpfc_sli4_get_allocated_extnts(struct lpfc_hba *phba, 
uint16_t type,
 }
 
 /**
- * lpfc_sli4_repost_sgl_list - Repsot the buffers sgl pages as block
+ * lpfc_sli4_repost_sgl_list - Repost the buffers sgl pages as block
  * @phba: pointer to lpfc hba data structure.
  * @pring: Pointer to driver SLI ring object.
  * @sgl_list: linked link of sgl buffers to post
-- 
2.1.0

Re: [RFC 4/8] p2pmem: Add debugfs "stats" file

2017-04-04 Thread Logan Gunthorpe



On 04/04/17 04:46 AM, Sagi Grimberg wrote:
> 
>> +p2pmem_debugfs_root = debugfs_create_dir("p2pmem", NULL);
>> +if (!p2pmem_debugfs_root)
>> +pr_info("could not create debugfs entry, continuing\n");
>> +
> 
> Why continue? I think it'd be better to just fail it.

Yup, agreed. This should probably also be PTR_ERR as well.

> Besides, this can be safely squashed into patch 1.

Sure, the only real reason I kept it separate was it was authored by
Steve Wise.

Logan

[lpfc 15/19] Fix max_sgl_segments settings for NVME / NVMET

2017-04-04 Thread Dick Kennedy

From: Dick Kennedy 

Cannot set these to a large number

The existing module parameter lpfc_sg_seg_cnt is used for both
SCSI and NVME.

Limit the module parameter lpfc_sg_seg_cnt to 128 with the
default being 64 for both NVME and NVMET, assuming NVME is enabled in the
driver for that port. The driver will set max_sgl_segments in the NVME/NVMET
template to lpfc_sg_seg_cnt + 1.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc.h   |  3 ++-
 drivers/scsi/lpfc/lpfc_nvme.c  | 18 ++
 drivers/scsi/lpfc/lpfc_nvmet.c | 19 +++
 3 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index 257bbdd..0fc1450 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -56,7 +56,7 @@ struct lpfc_sli2_slim;
 #define LPFC_MAX_SG_SEG_CNT4096/* sg element count per scsi cmnd */
 #define LPFC_MAX_SGL_SEG_CNT   512 /* SGL element count per scsi cmnd */
 #define LPFC_MAX_BPL_SEG_CNT   4096/* BPL element count per scsi cmnd */
-#define LPFC_MIN_NVME_SEG_CNT  254
+#define LPFC_MAX_NVME_SEG_CNT  128 /* max SGL element cnt per NVME cmnd */
 
 #define LPFC_MAX_SGE_SIZE   0x8000 /* Maximum data allowed in a SGE */
 #define LPFC_IOCB_LIST_CNT 2250/* list of IOCBs for fast-path usage. */
@@ -781,6 +781,7 @@ struct lpfc_hba {
uint32_t cfg_nvmet_fb_size;
uint32_t cfg_total_seg_cnt;
uint32_t cfg_sg_seg_cnt;
+   uint32_t cfg_nvme_seg_cnt;
uint32_t cfg_sg_dma_buf_size;
uint64_t cfg_soft_wwnn;
uint64_t cfg_soft_wwpn;
diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index 5145ef2..01cefaf 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -1114,12 +1114,12 @@ lpfc_nvme_prep_io_dma(struct lpfc_vport *vport,
 
first_data_sgl = sgl;
lpfc_ncmd->seg_cnt = nCmd->sg_cnt;
-   if (lpfc_ncmd->seg_cnt > phba->cfg_sg_seg_cnt) {
+   if (lpfc_ncmd->seg_cnt > phba->cfg_nvme_seg_cnt) {
lpfc_printf_log(phba, KERN_ERR, LOG_NVME_IOERR,
"6058 Too many sg segments from "
"NVME Transport.  Max %d, "
"nvmeIO sg_cnt %d\n",
-   phba->cfg_sg_seg_cnt,
+   phba->cfg_nvme_seg_cnt,
lpfc_ncmd->seg_cnt);
lpfc_ncmd->seg_cnt = 0;
return 1;
@@ -2159,8 +2159,18 @@ lpfc_nvme_create_localport(struct lpfc_vport *vport)
nfcp_info.node_name = wwn_to_u64(vport->fc_nodename.u.wwn);
nfcp_info.port_name = wwn_to_u64(vport->fc_portname.u.wwn);
 
-   /* For now need + 1 to get around NVME transport logic */
-   lpfc_nvme_template.max_sgl_segments = phba->cfg_sg_seg_cnt + 1;
+   /* Limit to LPFC_MAX_NVME_SEG_CNT.
+* For now need + 1 to get around NVME transport logic.
+*/
+   if (phba->cfg_sg_seg_cnt > LPFC_MAX_NVME_SEG_CNT) {
+   lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME | LOG_INIT,
+"6300 Reducing sg segment cnt to %d\n",
+LPFC_MAX_NVME_SEG_CNT);
+   phba->cfg_nvme_seg_cnt = LPFC_MAX_NVME_SEG_CNT;
+   } else {
+   phba->cfg_nvme_seg_cnt = phba->cfg_sg_seg_cnt;
+   }
+   lpfc_nvme_template.max_sgl_segments = phba->cfg_nvme_seg_cnt + 1;
lpfc_nvme_template.max_hw_queues = phba->cfg_nvme_io_channel;
 
/* localport is allocated from the stack, but the registration
diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c
index 3a0ff50..ea7a5b2 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.c
+++ b/drivers/scsi/lpfc/lpfc_nvmet.c
@@ -705,8 +705,19 @@ lpfc_nvmet_create_targetport(struct lpfc_hba *phba)
pinfo.port_name = wwn_to_u64(vport->fc_portname.u.wwn);
pinfo.port_id = vport->fc_myDID;
 
+   /* Limit to LPFC_MAX_NVME_SEG_CNT.
+* For now need + 1 to get around NVME transport logic.
+*/
+   if (phba->cfg_sg_seg_cnt > LPFC_MAX_NVME_SEG_CNT) {
+   lpfc_printf_log(phba, KERN_INFO, LOG_NVME | LOG_INIT,
+   "6400 Reducing sg segment cnt to %d\n",
+   LPFC_MAX_NVME_SEG_CNT);
+   phba->cfg_nvme_seg_cnt = LPFC_MAX_NVME_SEG_CNT;
+   } else {
+   phba->cfg_nvme_seg_cnt = phba->cfg_sg_seg_cnt;
+   }
+   lpfc_tgttemplate.max_sgl_segments = phba->cfg_nvme_seg_cnt + 1;
lpfc_tgttemplate.max_hw_queues = phba->cfg_nvme_io_channel;
-   lpfc_tgttemplate.max_sgl_segments = phba->cfg_sg_seg_cnt + 1;
lpfc_tgttemplate.target_features =

[lpfc 16/19] Add Fabric assigned WWN support.

2017-04-04 Thread Dick Kennedy

From: Dick Kennedy 

Firmware sends first FLOGI to fabric with vendor version changes.
On link up driver gets updated service parameter with FAWWN assigned port name.
Driver sends 2nd FLOGI with updated fawwpn and modifies the vport->fc_portname i
n driver.
Soft wwpn will not be allowed when fawwpn is enabled.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc.h |  2 ++
 drivers/scsi/lpfc/lpfc_attr.c|  8 
 drivers/scsi/lpfc/lpfc_els.c |  8 +---
 drivers/scsi/lpfc/lpfc_hbadisc.c | 22 +-
 drivers/scsi/lpfc/lpfc_hw.h  |  3 +++
 drivers/scsi/lpfc/lpfc_hw4.h |  1 +
 drivers/scsi/lpfc/lpfc_init.c| 31 ---
 7 files changed, 64 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index 0fc1450..6d7840b 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -474,6 +474,8 @@ struct lpfc_vport {
unsigned long rcv_buffer_time_stamp;
uint32_t vport_flag;
 #define STATIC_VPORT   1
+#define FAWWPN_SET 2
+#define FAWWPN_PARAM_CHG   4
 
uint16_t fdmi_num_disc;
uint32_t fdmi_hba_mask;
diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index 0e28c2e..69a0039 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -2292,6 +2292,8 @@ lpfc_soft_wwn_enable_store(struct device *dev, struct 
device_attribute *attr,
struct lpfc_vport *vport = (struct lpfc_vport *) shost->hostdata;
struct lpfc_hba   *phba = vport->phba;
unsigned int cnt = count;
+   uint8_t vvvl = vport->fc_sparam.cmn.valid_vendor_ver_level;
+   uint32_t *fawwpn_key = (uint32_t 
*)>fc_sparam.un.vendorVersion[0];
 
/*
 * We're doing a simple sanity check for soft_wwpn setting.
@@ -2305,6 +2307,12 @@ lpfc_soft_wwn_enable_store(struct device *dev, struct 
device_attribute *attr,
 * here. The intent is to protect against the random user or
 * application that is just writing attributes.
 */
+   if (vvvl == 1 && (cpu_to_be32(*fawwpn_key)) == FAPWWN_KEY_VENDOR) {
+   lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
+"0051 "LPFC_DRIVER_NAME" soft wwpn can not"
+" be enabled: fawwpn is enabled\n");
+   return -EPERM;
+   }
 
/* count may include a LF at end of string */
if (buf[cnt-1] == '\n')
diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index ffdcac0..35fc260 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -603,9 +603,11 @@ lpfc_check_clean_addr_bit(struct lpfc_vport *vport,
memcmp(>fabric_portname, >portName,
sizeof(struct lpfc_name)) ||
memcmp(>fabric_nodename, >nodeName,
-   sizeof(struct lpfc_name)))
+   sizeof(struct lpfc_name)) ||
+   (vport->vport_flag & FAWWPN_PARAM_CHG)) {
fabric_param_changed = 1;
-
+   vport->vport_flag &= ~FAWWPN_PARAM_CHG;
+   }
/*
 * Word 1 Bit 31 in common service parameter is overloaded.
 * Word 1 Bit 31 in FLOGI request is multiple NPort request
@@ -8755,7 +8757,7 @@ lpfc_issue_els_fdisc(struct lpfc_vport *vport, struct 
lpfc_nodelist *ndlp,
pcmd += sizeof(uint32_t); /* Node Name */
pcmd += sizeof(uint32_t); /* Node Name */
memcpy(pcmd, >fc_nodename, 8);
-
+   memset(sp->un.vendorVersion, 0, sizeof(sp->un.vendorVersion));
lpfc_set_disctmo(vport);
 
phba->fc_stat.elsXmitFDISC++;
diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index d313dde..e824ec1 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -3002,6 +3002,7 @@ lpfc_mbx_cmpl_read_sparam(struct lpfc_hba *phba, 
LPFC_MBOXQ_t *pmb)
MAILBOX_t *mb = >u.mb;
struct lpfc_dmabuf *mp = (struct lpfc_dmabuf *) pmb->context1;
struct lpfc_vport  *vport = pmb->vport;
+   struct Scsi_Host *shost = lpfc_shost_from_vport(vport);
struct serv_parm *sp = >fc_sparam;
uint32_t ed_tov;
 
@@ -3031,6 +3032,7 @@ lpfc_mbx_cmpl_read_sparam(struct lpfc_hba *phba, 
LPFC_MBOXQ_t *pmb)
}
 
lpfc_update_vport_wwn(vport);
+   fc_host_port_name(shost) = wwn_to_u64(vport->fc_portname.u.wwn);
if (vport->port_type == LPFC_PHYSICAL_PORT) {
memcpy(>wwnn, >fc_nodename, sizeof(phba->wwnn));
memcpy(>wwpn, >fc_portname, sizeof(phba->wwnn));
@@ -3309,6 +3311,7 @@ lpfc_mbx_cmpl_read_topology(struct lpfc_hba *phba, 
LPFC_MBOXQ_t *pmb)
struct lpfc_sli_ring *pring;
MAILBOX_t *mb = >u.mb;
struct lpfc_dmabuf *mp = (struct lpfc_dmabuf *)

[lpfc 12/19] cannot establish connection with target that sends PRLI in PT2PT

2017-04-04 Thread Dick Kennedy

From: Dick Kennedy 

The driver does not accept PRLI from the target  and this
causes the  target to reject the subsequent PRLI from the driver

Fix: Accept the incoming PRLI so that the target can also accept
the PRLI from the driver

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_init.c  | 12 
 drivers/scsi/lpfc/lpfc_nvme.c  | 23 ---
 drivers/scsi/lpfc/lpfc_nvmet.c |  6 +++---
 3 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index cca7f81..88977b8 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -11061,7 +11061,7 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const 
struct pci_device_id *pid)
struct lpfc_hba   *phba;
struct lpfc_vport *vport = NULL;
struct Scsi_Host  *shost = NULL;
-   int error;
+   int error, cnt;
uint32_t cfg_mode, intr_mode;
 
/* Allocate memory for HBA structure */
@@ -11095,11 +11095,15 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const 
struct pci_device_id *pid)
goto out_unset_pci_mem_s4;
}
 
+   cnt = phba->cfg_iocb_cnt * 1024;
+   if (phba->nvmet_support)
+   cnt += (phba->cfg_nvmet_mrq_post * phba->cfg_nvmet_mrq);
+
/* Initialize and populate the iocb list per host */
lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
-   "2821 initialize iocb list %d.\n",
-   phba->cfg_iocb_cnt*1024);
-   error = lpfc_init_iocb_list(phba, phba->cfg_iocb_cnt*1024);
+   "2821 initialize iocb list %d total %d\n",
+   phba->cfg_iocb_cnt, cnt);
+   error = lpfc_init_iocb_list(phba, cnt);
 
if (error) {
lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index 9d39093..9468679 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -1495,6 +1495,7 @@ lpfc_nvme_fcp_abort(struct nvme_fc_local_port 
*pnvme_lport,
 "io buffer.  Skipping abort req.\n");
return;
}
+   nvmereq_wqe = _nbuf->cur_iocbq;
 
/*
 * The lpfc_nbuf and the mapped nvme_fcreq in the driver's
@@ -1508,20 +1509,19 @@ lpfc_nvme_fcp_abort(struct nvme_fc_local_port 
*pnvme_lport,
lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME,
 "6143 NVME req mismatch: "
 "lpfc_nbuf %p nvmeCmd %p, "
-"pnvme_fcreq %p.  Skipping Abort\n",
+"pnvme_fcreq %p.  Skipping Abort xri x%x\n",
 lpfc_nbuf, lpfc_nbuf->nvmeCmd,
-pnvme_fcreq);
+pnvme_fcreq, nvmereq_wqe->sli4_xritag);
return;
}
 
/* Don't abort IOs no longer on the pending queue. */
-   nvmereq_wqe = _nbuf->cur_iocbq;
if (!(nvmereq_wqe->iocb_flag & LPFC_IO_ON_TXCMPLQ)) {
spin_unlock_irqrestore(>hbalock, flags);
lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME,
 "6142 NVME IO req %p not queued - skipping "
-"abort req\n",
-pnvme_fcreq);
+"abort req xri x%x\n",
+pnvme_fcreq, nvmereq_wqe->sli4_xritag);
return;
}
 
@@ -1535,8 +1535,9 @@ lpfc_nvme_fcp_abort(struct nvme_fc_local_port 
*pnvme_lport,
lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME,
 "6144 Outstanding NVME I/O Abort Request "
 "still pending on nvme_fcreq %p, "
-"lpfc_ncmd %p\n",
-pnvme_fcreq, lpfc_nbuf);
+"lpfc_ncmd %p xri x%x\n",
+pnvme_fcreq, lpfc_nbuf,
+nvmereq_wqe->sli4_xritag);
return;
}
 
@@ -1545,8 +1546,8 @@ lpfc_nvme_fcp_abort(struct nvme_fc_local_port 
*pnvme_lport,
spin_unlock_irqrestore(>hbalock, flags);
lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME,
 "6136 No available abort wqes. Skipping "
-"Abts req for nvme_fcreq %p.\n",
-pnvme_fcreq);
+"Abts req for nvme_fcreq %p xri x%x\n",
+pnvme_fcreq, nvmereq_wqe->sli4_xritag);
return;
}
 
@@ -1604,7 +1605,7 @@ lpfc_nvme_fcp_abort(struct nvme_fc_local_port 
*pnvme_lport,
}

[lpfc 08/19] Fix nvme initiator handling when CONFIG_LPFC_NVME_INITIATOR is not enabled.

2017-04-04 Thread Dick Kennedy

From: Dick Kennedy 

With update nvme upstream driver sources, loading
the driver with nvme enabled resulting in this Oops.

Mar  6 10:09:55 NVME-init kernel: BUG: unable to handle kernel NULL pointer dere
ference at 0018
Mar  6 10:09:55 NVME-init kernel: IP: lpfc_nvme_update_localport+0x23/0xd0 [lpfc
]
Mar  6 10:09:55 NVME-init kernel: PGD 0
Mar  6 10:09:55 NVME-init kernel:
Mar  6 10:09:55 NVME-init kernel: Oops:  [#1] SMP
Mar  6 10:09:55 NVME-init kernel: CPU: 0 PID: 10256 Comm: lpfc_worker_0 Tainted:
 G   O4.10.0-rc7 #1
Mar  6 10:09:55 NVME-init kernel: Hardware name: Supermicro Super Server/X10DRL-
i, BIOS 2.0 12/18/2015
Mar  6 10:09:55 NVME-init kernel: task: 881028191c40 task.stack: 880ffdf
0
Mar  6 10:09:55 NVME-init kernel: RIP: 0010:lpfc_nvme_update_localport+0x23/0xd0
 [lpfc]
Mar  6 10:09:55 NVME-init kernel: RSP: 0018:880ffdf03c20 EFLAGS: 00010202
Mar  6 10:09:55 NVME-init kernel: RAX: 0003 RBX: 88103285b7d0 RC
X: 0001
Mar  6 10:09:55 NVME-init kernel: RDX: 881031c13400 RSI: 0292 RD
I: 88103285b7d0
Mar  6 10:09:55 NVME-init kernel: RBP: 880ffdf03c40 R08: 00fc R0
9: 880fff6c66c0
Mar  6 10:09:55 NVME-init kernel: R10: 880fff01f200 R11: 880fff6c6680 R1
2: 
Mar  6 10:09:55 NVME-init kernel: R13: 8810308ac000 R14:  R1
5: 0028
Mar  6 10:09:55 NVME-init kernel: FS:  () GS:88103f2
0() knlGS:
Mar  6 10:09:55 NVME-init kernel: CS:  0010 DS:  ES:  CR0: 80050
033
Mar  6 10:09:55 NVME-init kernel: CR2: 0018 CR3: 01009000 CR
4: 001406f0

Cause: As the initiator driver completes discovery at different stages, it call
lpfc_nvme_update_localport to hint that the DID and role may have changed.  In
the implementation of lpfc_nvme_update_localport, the driver was not
validating the localport or the lport during the execution
of the update_localport routine.  With the recent upstream additions to
the driver, the create_localport routine didn't run and so the localport
was NULL causing the page-fault Oops.

Fix: Add the CONFIG_LPFC_NVME_INITIATOR preprocessor inclusions to
lpfc_nvme_update_localport to turn off all routine processing when
the running kernel does not have NVME configured.  Add NULL pointer
checks on the localport and lport in lpfc_nvme_update_localport and
dump messages if they are NULL and just exit.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_nvme.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index 6465aa6..9726c95 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -2264,12 +2264,23 @@ lpfc_nvme_destroy_localport(struct lpfc_vport *vport)
 void
 lpfc_nvme_update_localport(struct lpfc_vport *vport)
 {
+#ifdef CONFIG_LPFC_NVME_INITIATOR
struct nvme_fc_local_port *localport;
struct lpfc_nvme_lport *lport;
 
localport = vport->localport;
+   if (!localport) {
+   lpfc_printf_vlog(vport, KERN_WARNING, LOG_NVME,
+"6710 Update NVME fail. No localport\n");
+   return;
+   }
lport = (struct lpfc_nvme_lport *)localport->private;
-
+   if (!lport) {
+   lpfc_printf_vlog(vport, KERN_WARNING, LOG_NVME,
+"6171 Update NVME fail. localP %p, No lport\n",
+localport);
+   return;
+   }
lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME,
 "6012 Update NVME lport %p did x%x\n",
 localport, vport->fc_myDID);
@@ -2283,7 +2294,7 @@ lpfc_nvme_update_localport(struct lpfc_vport *vport)
lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_DISC,
 "6030 bound lport %p to DID x%06x\n",
 lport, localport->port_id);
-
+#endif
 }
 
 int
-- 
2.1.0

[lpfc 19/19] lpfc revison 11.2.0.12

2017-04-04 Thread Dick Kennedy

From: Dick Kennedy 

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_version.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_version.h b/drivers/scsi/lpfc/lpfc_version.h
index d4e95e2..1c26dc6 100644
--- a/drivers/scsi/lpfc/lpfc_version.h
+++ b/drivers/scsi/lpfc/lpfc_version.h
@@ -20,7 +20,7 @@
  * included with this package. *
  ***/
 
-#define LPFC_DRIVER_VERSION "11.2.0.10"
+#define LPFC_DRIVER_VERSION "11.2.0.12"
 #define LPFC_DRIVER_NAME   "lpfc"
 
 /* Used for SLI 2/3 */
-- 
2.1.0

[lpfc 11/19] Add NVMET changes to interface with 4.11 kernel

2017-04-04 Thread Dick Kennedy

From: Dick Kennedy 

---
 drivers/scsi/lpfc/lpfc_nvmet.c | 127 -
 drivers/scsi/lpfc/lpfc_nvmet.h |   7 ++-
 2 files changed, 105 insertions(+), 29 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c
index 551e81a..cbd6371 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.c
+++ b/drivers/scsi/lpfc/lpfc_nvmet.c
@@ -408,9 +408,7 @@ out:
if (phba->ktime_on)
lpfc_nvmet_ktime(phba, ctxp);
 #endif
-   /* Let Abort cmpl repost the context */
-   if (!(ctxp->flag & LPFC_NVMET_ABORT_OP))
-   lpfc_nvmet_rq_post(phba, ctxp, >rqb_buffer->hbuf);
+   /* lpfc_nvmet_xmt_fcp_release() will recycle the context */
} else {
ctxp->entry_cnt++;
start_clean = offsetof(struct lpfc_iocbq, wqe);
@@ -543,27 +541,6 @@ lpfc_nvmet_xmt_fcp_op(struct nvmet_fc_target_port *tgtport,
}
 #endif
 
-   if (rsp->op == NVMET_FCOP_ABORT) {
-   lpfc_printf_log(phba, KERN_INFO, LOG_NVME_ABTS,
-   "6103 Abort op: oxri x%x %d cnt %d\n",
-   ctxp->oxid, ctxp->state, ctxp->entry_cnt);
-
-   lpfc_nvmeio_data(phba, "NVMET FCP ABRT: "
-"xri x%x state x%x cnt x%x\n",
-ctxp->oxid, ctxp->state, ctxp->entry_cnt);
-
-   atomic_inc(_nvmep->xmt_fcp_abort);
-   ctxp->entry_cnt++;
-   ctxp->flag |= LPFC_NVMET_ABORT_OP;
-   if (ctxp->flag & LPFC_NVMET_IO_INP)
-   lpfc_nvmet_sol_fcp_issue_abort(phba, ctxp, ctxp->sid,
-  ctxp->oxid);
-   else
-   lpfc_nvmet_unsol_fcp_issue_abort(phba, ctxp, ctxp->sid,
-ctxp->oxid);
-   return 0;
-   }
-
/* Sanity check */
if (ctxp->state == LPFC_NVMET_STE_ABORT) {
atomic_inc(_nvmep->xmt_fcp_drop);
@@ -630,10 +607,76 @@ lpfc_nvmet_targetport_delete(struct nvmet_fc_target_port 
*targetport)
complete(>tport_unreg_done);
 }
 
+static void
+lpfc_nvmet_xmt_fcp_abort(struct nvmet_fc_target_port *tgtport,
+struct nvmefc_tgt_fcp_req *req)
+{
+   struct lpfc_nvmet_tgtport *lpfc_nvmep = tgtport->private;
+   struct lpfc_nvmet_rcv_ctx *ctxp =
+   container_of(req, struct lpfc_nvmet_rcv_ctx, ctx.fcp_req);
+   struct lpfc_hba *phba = ctxp->phba;
+
+   lpfc_printf_log(phba, KERN_INFO, LOG_NVME_ABTS,
+   "6103 Abort op: oxri x%x %d cnt %d\n",
+   ctxp->oxid, ctxp->state, ctxp->entry_cnt);
+
+   lpfc_nvmeio_data(phba, "NVMET FCP ABRT: "
+"xri x%x state x%x cnt x%x\n",
+ctxp->oxid, ctxp->state, ctxp->entry_cnt);
+
+   atomic_inc(_nvmep->xmt_fcp_abort);
+   ctxp->entry_cnt++;
+   ctxp->flag |= LPFC_NVMET_ABORT_OP;
+   if (ctxp->flag & LPFC_NVMET_IO_INP)
+   lpfc_nvmet_sol_fcp_issue_abort(phba, ctxp, ctxp->sid,
+  ctxp->oxid);
+   else
+   lpfc_nvmet_unsol_fcp_issue_abort(phba, ctxp, ctxp->sid,
+ctxp->oxid);
+}
+
+static void
+lpfc_nvmet_xmt_fcp_release(struct nvmet_fc_target_port *tgtport,
+  struct nvmefc_tgt_fcp_req *rsp)
+{
+   struct lpfc_nvmet_tgtport *lpfc_nvmep = tgtport->private;
+   struct lpfc_nvmet_rcv_ctx *ctxp =
+   container_of(rsp, struct lpfc_nvmet_rcv_ctx, ctx.fcp_req);
+   struct lpfc_hba *phba = ctxp->phba;
+   unsigned long flags;
+   bool aborting = false;
+
+   spin_lock_irqsave(>ctxlock, flags);
+   if (ctxp->flag & LPFC_NVMET_ABORT_OP) {
+   aborting = true;
+   ctxp->flag |= LPFC_NVMET_CTX_RLS;
+   }
+   spin_unlock_irqrestore(>ctxlock, flags);
+
+   if (aborting)
+   /* let the abort path do the real release */
+   return;
+
+   /* Sanity check */
+   if (ctxp->state != LPFC_NVMET_STE_DONE) {
+   atomic_inc(_nvmep->xmt_fcp_drop);
+   lpfc_printf_log(phba, KERN_ERR, LOG_NVME_IOERR,
+   "6117 Bad state IO x%x aborted\n",
+   ctxp->oxid);
+   }
+
+   lpfc_nvmeio_data(phba, "NVMET FCP FREE: xri x%x ste %d\n", ctxp->oxid,
+ctxp->state, 0);
+
+   lpfc_nvmet_rq_post(phba, ctxp, >rqb_buffer->hbuf);
+}
+
 static struct nvmet_fc_target_template lpfc_tgttemplate = {
.targetport_delete = lpfc_nvmet_targetport_delete,
.xmt_ls_rsp = lpfc_nvmet_xmt_ls_rsp,
.fcp_op = lpfc_nvmet_xmt_fcp_op,
+   .fcp_abort  =

[lpfc 00/19] lpfc updates.

2017-04-04 Thread Dick Kennedy

This patch set includes patches to fix driver load/unload that have been 
partiallty submitted by other users.
This patch set is supplying the abort handline for nvme that was missing.


Dick Kennedy (19):
  Fix advertised max_sgl_segment count for NVMET
  Fixes after reviewing last set of patches. Minor fixes to previous
patches
  Fix spelling in comments.
  Fix PRLI ACC rsp for NVME
  Fix driver unload/reload operation.
  Fix discovery error handling in NVME initiator
  Fix driver usage of 128B WQEs when WQ_CREATE is V1.
  Fix nvme initiator handling when CONFIG_LPFC_NVME_INITIATOR is not
enabled.
  Fix Kconfig defines.
  Remove hba lock from NVMET isssue WQE
  Add NVMET changes to interface with 4.11 kernel
  cannot establish connection with target that sends PRLI in PT2PT
  Fix driver load issues when MRQ=8
  Fix crash after issuing lip reset
  Fix max_sgl_segments settings for NVME / NVMET
  Add Fabric assigned WWN support.
  Fix implicit logo and RSCN handling for NVMET
  Update ABORT processing for NVMET
  lpfc revison 11.2.0.12

 drivers/scsi/lpfc/lpfc.h   |   5 +-
 drivers/scsi/lpfc/lpfc_attr.c  |  16 +-
 drivers/scsi/lpfc/lpfc_bsg.c   |   4 +
 drivers/scsi/lpfc/lpfc_crtn.h  |   9 +-
 drivers/scsi/lpfc/lpfc_ct.c|  68 --
 drivers/scsi/lpfc/lpfc_debugfs.c   |  69 --
 drivers/scsi/lpfc/lpfc_disc.h  |   1 +
 drivers/scsi/lpfc/lpfc_els.c   |  68 +++---
 drivers/scsi/lpfc/lpfc_hbadisc.c   | 131 +++
 drivers/scsi/lpfc/lpfc_hw.h|   3 +
 drivers/scsi/lpfc/lpfc_hw4.h   |   4 +
 drivers/scsi/lpfc/lpfc_init.c  | 169 ++-
 drivers/scsi/lpfc/lpfc_mbox.c  |   7 +-
 drivers/scsi/lpfc/lpfc_nportdisc.c |   8 +-
 drivers/scsi/lpfc/lpfc_nvme.c  | 139 
 drivers/scsi/lpfc/lpfc_nvme.h  |   7 +-
 drivers/scsi/lpfc/lpfc_nvmet.c | 434 ++---
 drivers/scsi/lpfc/lpfc_nvmet.h |  13 +-
 drivers/scsi/lpfc/lpfc_sli.c   |  22 +-
 drivers/scsi/lpfc/lpfc_sli4.h  |   2 +-
 drivers/scsi/lpfc/lpfc_version.h   |   2 +-
 drivers/scsi/lpfc/lpfc_vport.c |   3 +-
 22 files changed, 870 insertions(+), 314 deletions(-)

-- 
2.1.0

[lpfc 04/19] Fix PRLI ACC rsp for NVME

2017-04-04 Thread Dick Kennedy

From: Dick Kennedy 

PRLI ACC from target is FCP oriented


Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_els.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index d9c61d0..d6e0f58 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -4403,7 +4403,7 @@ lpfc_els_rsp_prli_acc(struct lpfc_vport *vport, struct 
lpfc_iocbq *oldiocb,
pcmd = (uint8_t *) (((struct lpfc_dmabuf *) elsiocb->context2)->virt);
memset(pcmd, 0, cmdsize);
 
-   *((uint32_t *) (pcmd)) = (ELS_CMD_ACC | (ELS_CMD_PRLI & ~ELS_RSP_MASK));
+   *((uint32_t *)(pcmd)) = elsrspcmd;
pcmd += sizeof(uint32_t);
 
/* For PRLI, remainder of payload is PRLI parameter page */
-- 
2.1.0

Re: [RFC 6/8] nvmet: Be careful about using iomem accesses when dealing with p2pmem

2017-04-04 Thread Logan Gunthorpe

On 04/04/17 04:59 AM, Sagi Grimberg wrote:
> We can never ever get here from an IO command, and that is a good thing
> because it would have been broken if we did, regardless of what copy
> method we use...

Yes, I changed this mostly for admin commands. I did notice connect
commands do end up reading from the p2mem and this patchset correctly
switches it to iomemcpy. However, based on Cristoph's comment, I hope to
make it more general such that iomem is hidden within sgls and any
access will either be correct or create a warning.

On 04/04/17 09:46 AM, Jason Gunthorpe wrote:
> Transactions might not complete at the NVMe device before the CPU
> processes the RDMA completion, however due to the PCI-E ordering rules
> new TLPs directed to the NVMe will complete after the RMDA TLPs and
> thus observe the new data. (eg order preserving)
> 
> It would be very hard to use P2P if fabric ordering is not preserved..

Yes, my understanding is the same, the PCI-E ordering rules save us here.

Thanks,

Logan

Re: [RFC PATCH 0/2] block,scsi: support host-wide tagset

2017-04-04 Thread Hannes Reinecke


On 04/04/2017 05:59 PM, Ming Lei wrote:

On Tue, Apr 4, 2017 at 8:07 PM, Hannes Reinecke  wrote:

Hi all,

as discussed recently most existing HBAs have a host-wide tagset which
does not map easily onto the per-queue tagset model of block mq.
This patchset implements a flag BLK_MQ_F_GLOBAL_TAGS for block-mq, which
enables the use of a shared tagset for all hardware queues.
The second patch adds a flag 'host_tagset' to the SCSI host template,
which allows drivers to enable the use of the global tagset.

This patchset probably has some performance implications as
there is a quite high probability of cache-bouncing when allocating
tags. Also I'm not quite sure if the implemented tagset sharing
is the correct way to handle things.
So this can be considered an RFC.

As usual, comments and reviews are welcome.


Just be curious, why is multi hard queue used for this case? Are there
some real cases in SCSI?


Yes.
This is required by basically every driver in drivers/scsi which would 
in theory be able to support scsi-mq (ie lpfc, qla2xxx, mpt3sas, and 
possibly fnic). Each of them support several submission/completion 
queues, but every one of them has a host-wide tag map, too.


Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

Re: [RFC PATCH 0/2] block,scsi: support host-wide tagset

2017-04-04 Thread Hannes Reinecke


On 04/04/2017 05:32 PM, Omar Sandoval wrote:

On Tue, Apr 04, 2017 at 02:07:43PM +0200, Hannes Reinecke wrote:

Hi all,

as discussed recently most existing HBAs have a host-wide tagset which
does not map easily onto the per-queue tagset model of block mq.
This patchset implements a flag BLK_MQ_F_GLOBAL_TAGS for block-mq, which
enables the use of a shared tagset for all hardware queues.
The second patch adds a flag 'host_tagset' to the SCSI host template,
which allows drivers to enable the use of the global tagset.

This patchset probably has some performance implications as
there is a quite high probability of cache-bouncing when allocating
tags. Also I'm not quite sure if the implemented tagset sharing
is the correct way to handle things.
So this can be considered an RFC.

As usual, comments and reviews are welcome.


Hi, Hannes,

blk-mq already supports a shared tagset, and scsi-mq already uses that.
When we initialize a request queue, we add it to a tagset with
blk_mq_add_queue_set(), where we automatically mark the tagset as shared
if there is more than one queue using it. What does this do that
BLK_MQ_F_TAG_SHARED doesn't cover?


This is orthogonal to the BLK_MQ_F_TAG_SHARED flag.
That flag is covering the case for several block devices sharing the 
same tagset.
My new flag is covering the case where several _hardware_ queues are 
sharing the same tagset.


Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

Re: [RFC PATCH 0/2] block,scsi: support host-wide tagset

2017-04-04 Thread Bart Van Assche

On 04/04/2017 09:00 AM, Ming Lei wrote:
> Just be curious, why is multi hard queue used for this case? Are there
> some real cases in SCSI?

Hello Ming,

Yes, there is a real need for this. Background information is available
in the following e-mail thread: Arun Easi, "scsi-mq - tag# and
can_queue, performance", April 2, 2017, linux-scsi
(http://www.spinics.net/lists/linux-scsi/msg106853.html).

Bart.

Re: [RFC 3/8] nvmet: Use p2pmem in nvme target

2017-04-04 Thread Logan Gunthorpe

On 04/04/17 04:40 AM, Sagi Grimberg wrote:
> Hey Logan,
> 
>> We create a configfs attribute in each nvme-fabrics target port to
>> enable p2p memory use. When enabled, the port will only then use the
>> p2p memory if a p2p memory device can be found which is behind the
>> same switch as the RDMA port and all the block devices in use. If
>> the user enabled it an no devices are found, then the system will
>> silently fall back on using regular memory.
> 
> What should we do if we have more than a single device that satisfies
> this? I'd say that it would be better to have the user ask for a
> specific device and fail it if it doesn't meet the above conditions...

I hadn't done this yet but I think a simple closest device in the tree
would solve the issue sufficiently. However, I originally had it so the
user has to pick the device and I prefer that approach. But if the user
picks the device, then why bother restricting what he picks? Per the
thread with Sinan, I'd prefer to use what the user picks. You were one
of the biggest opponents to that so I'd like to hear your opinion on
removing the restrictions.

>> Ideally, we'd want to use an NVME CMB buffer as p2p memory. This would
>> save an extra PCI transfer as the NVME card could just take the data
>> out of it's own memory. However, at this time, cards with CMB buffers
>> don't seem to be available.
> 
> Even if it was available, it would be hard to make real use of this
> given that we wouldn't know how to pre-post recv buffers (for in-capsule
> data). But let's leave this out of the scope entirely...

I don't understand what you're referring to. We'd simply use the CMB
buffer as a p2pmem device, why does that change anything?

> Why do you need this? you have a reference to the
> queue itself.

This keeps track of whether the response was actually allocated with
p2pmem or not. It's needed for when we free the SGL because the queue
may have a p2pmem device assigned to it but, if the alloc failed and it
fell back on system memory then we need to know how to free it. I'm
currently looking at having SGLs having an iomem flag. In which case,
this would no longer be needed as the flag in the SGL could be used.

>> +rsp->p2pmem = rsp->queue->p2pmem;
>>  status = nvmet_rdma_alloc_sgl(>req.sg, >req.sg_cnt,
>> -len);
>> +len, rsp->p2pmem);
>> +
>> +if (status && rsp->p2pmem) {
>> +rsp->p2pmem = NULL;
>> +status = nvmet_rdma_alloc_sgl(>req.sg, >req.sg_cnt,
>> +  len, rsp->p2pmem);
>> +}
>> +
> 
> Not sure its a good practice to rely on rsp->p2pmem not being NULL...
> Would be nice if the allocation routines can hide it from us...

I'm not sure what the reasoning is behind your NULL comment.

Yes, I'm currently considering pushing an alloc/free sgl into the p2pmem
code.

>>  if (status)
>>  return status;
>>
>> @@ -984,6 +1009,7 @@ static void nvmet_rdma_free_queue(struct
>> nvmet_rdma_queue *queue)
>>  !queue->host_qid);
>>  }
>>  nvmet_rdma_free_rsps(queue);
>> +p2pmem_put(queue->p2pmem);
> 
> What does this pair with? p2pmem_find_compat()?

Yes, that's correct.

>> +static void nvmet_rdma_queue_setup_p2pmem(struct nvmet_rdma_queue
>> *queue)
>> +{
>> +struct device **dma_devs;
>> +struct nvmet_ns *ns;
>> +int ndevs = 1;
>> +int i = 0;
>> +struct nvmet_subsys_link *s;
>> +
>> +if (!queue->port->allow_p2pmem)
>> +return;
>> +
>> +list_for_each_entry(s, >port->subsystems, entry) {
>> +list_for_each_entry_rcu(ns, >subsys->namespaces, dev_link) {
>> +ndevs++;
>> +}
>> +}
> 
> This code has no business in nvmet-rdma. Why not keep nr_ns in
> nvmet_subsys in the first place?

That makes sense.

>> +
>> +dma_devs = kmalloc((ndevs + 1) * sizeof(*dma_devs), GFP_KERNEL);
>> +if (!dma_devs)
>> +return;
>> +
>> +dma_devs[i++] = >dev->device->dev;
>> +
>> +list_for_each_entry(s, >port->subsystems, entry) {
>> +list_for_each_entry_rcu(ns, >subsys->namespaces, dev_link) {
>> +dma_devs[i++] = disk_to_dev(ns->bdev->bd_disk);
>> +}
>> +}
>> +
>> +dma_devs[i++] = NULL;
>> +
>> +queue->p2pmem = p2pmem_find_compat(dma_devs);
> 
> This is a problem. namespaces can be added at any point in time. No one
> guarantee that dma_devs are all the namepaces we'll ever see.

Yeah, well restricting p2pmem based on all the devices in use is hard.
So we'd need a call into the transport every time an ns is added and
we'd have to drop the p2pmem if they add one that isn't supported. This
complexity is just one of the reasons I prefer just letting the user chose.

>> +
>> +if (queue->p2pmem)
>> +pr_debug("using %s for rdma nvme target queue",
>> + dev_name(>p2pmem->dev));
>> +
>> +kfree(dma_devs);
>> +}
>> +
>>  static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id,
>>  struct rdma_cm_event *event)
>>  {

Re: [RFC PATCH 0/2] block,scsi: support host-wide tagset

2017-04-04 Thread Ming Lei

On Tue, Apr 4, 2017 at 8:07 PM, Hannes Reinecke  wrote:
> Hi all,
>
> as discussed recently most existing HBAs have a host-wide tagset which
> does not map easily onto the per-queue tagset model of block mq.
> This patchset implements a flag BLK_MQ_F_GLOBAL_TAGS for block-mq, which
> enables the use of a shared tagset for all hardware queues.
> The second patch adds a flag 'host_tagset' to the SCSI host template,
> which allows drivers to enable the use of the global tagset.
>
> This patchset probably has some performance implications as
> there is a quite high probability of cache-bouncing when allocating
> tags. Also I'm not quite sure if the implemented tagset sharing
> is the correct way to handle things.
> So this can be considered an RFC.
>
> As usual, comments and reviews are welcome.

Just be curious, why is multi hard queue used for this case? Are there
some real cases in SCSI?


Thanks,
Ming Lei

Re: [RFC 2/8] cxgb4: setup pcie memory window 4 and create p2pmem region

2017-04-04 Thread Logan Gunthorpe



On 04/04/17 04:42 AM, Sagi Grimberg wrote:
> This is weird, why even call this if !use_p2pmem?

I personally find it cleaner than:

if (use_p2pmem)
setup_memwin_p2pmem(...)

I'm not sure why that's so weird.

Logan

Re: [RFC 6/8] nvmet: Be careful about using iomem accesses when dealing with p2pmem

2017-04-04 Thread Jason Gunthorpe

On Tue, Apr 04, 2017 at 01:59:26PM +0300, Sagi Grimberg wrote:
> Note that the nvme completion queues are still on the host memory, so
> this means we have lost the ordering between data and completions as
> they go to different pcie targets.

Hmm, in this simple up/down case with a switch, I think it might
actually be OK.

Transactions might not complete at the NVMe device before the CPU
processes the RDMA completion, however due to the PCI-E ordering rules
new TLPs directed to the NVMe will complete after the RMDA TLPs and
thus observe the new data. (eg order preserving)

It would be very hard to use P2P if fabric ordering is not preserved..

Jason

Re: [RFC PATCH 0/2] block,scsi: support host-wide tagset

2017-04-04 Thread Bart Van Assche

On Tue, 2017-04-04 at 08:32 -0700, Omar Sandoval wrote:
> blk-mq already supports a shared tagset, and scsi-mq already uses that.
> When we initialize a request queue, we add it to a tagset with
> blk_mq_add_queue_set(), where we automatically mark the tagset as shared
> if there is more than one queue using it. What does this do that
> BLK_MQ_F_TAG_SHARED doesn't cover?

Hello Omar,

Today blk-mq creates one tag set per hardware queue. The sharing by
scsi-mq is between request queues for hardware queues that have the same
index but not between hardware queues of a single request queue. My
understanding is that the goal of this patch series is to make it possible
to use a single tag set for all hardware queues and all request queues that
share the same SCSI host.

Bart.

Re: [RFC PATCH 0/2] block,scsi: support host-wide tagset

2017-04-04 Thread Omar Sandoval

On Tue, Apr 04, 2017 at 02:07:43PM +0200, Hannes Reinecke wrote:
> Hi all,
> 
> as discussed recently most existing HBAs have a host-wide tagset which
> does not map easily onto the per-queue tagset model of block mq.
> This patchset implements a flag BLK_MQ_F_GLOBAL_TAGS for block-mq, which
> enables the use of a shared tagset for all hardware queues.
> The second patch adds a flag 'host_tagset' to the SCSI host template,
> which allows drivers to enable the use of the global tagset.
> 
> This patchset probably has some performance implications as
> there is a quite high probability of cache-bouncing when allocating
> tags. Also I'm not quite sure if the implemented tagset sharing
> is the correct way to handle things.
> So this can be considered an RFC.
> 
> As usual, comments and reviews are welcome.

Hi, Hannes,

blk-mq already supports a shared tagset, and scsi-mq already uses that.
When we initialize a request queue, we add it to a tagset with
blk_mq_add_queue_set(), where we automatically mark the tagset as shared
if there is more than one queue using it. What does this do that
BLK_MQ_F_TAG_SHARED doesn't cover?

[PATCHv5 5/8] scsi: make eh_eflags persistent

2017-04-04 Thread Hannes Reinecke

If a failed command is retried and fails again we need
to enter SCSI EH, otherwise we will never be able to
recover the command.
To detect this situation we must not clear scmd->eh_eflags
when EH finishes but rather make it persistent throughout
the lifetime of the command.

Reviewed-by: Benjamin Block 
Reviewed-by: Bart Van Assche 
Reviewed-by: Johannes Thumshirn 
Signed-off-by: Hannes Reinecke 
---
 Documentation/scsi/scsi_eh.txt  | 14 +++---
 drivers/scsi/libsas/sas_scsi_host.c |  2 --
 drivers/scsi/scsi_error.c   |  4 ++--
 include/scsi/scsi_eh.h  |  1 +
 4 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
index 37eca00..4edb9c1c 100644
--- a/Documentation/scsi/scsi_eh.txt
+++ b/Documentation/scsi/scsi_eh.txt
@@ -105,11 +105,14 @@ function
 
  2. If the host supports asynchronous completion (as indicated by the
 no_async_abort setting in the host template) scsi_abort_command()
-is invoked to schedule an asynchrous abort. If that fails
-Step #3 is taken.
+is invoked to schedule an asynchrous abort.
+Asynchronous abort are not invoked for commands which the
+SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
+already had been aborted once, and this is a retry which failed),
+or when the EH deadline is expired. In these case Step #3 is taken.
 
- 2. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
-command.  See [1-3] for more information.
+ 3. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
+command.  See [1-4] for more information.
 
 [1-3] Asynchronous command aborts
 
@@ -263,7 +266,6 @@ scmd->allowed.
 
  3. scmd recovered
 ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd
-   - clear scmd->eh_eflags
- scsi_setup_cmd_retry()
- move from local eh_work_q to local eh_done_q
 LOCKING: none
@@ -456,8 +458,6 @@ except for #1 must be implemented by eh_strategy_handler().
 
  - shost->host_failed is zero.
 
- - Each scmd's eh_eflags field is cleared.
-
  - Each scmd is in such a state that scsi_setup_cmd_retry() on the
scmd doesn't make any difference.
 
diff --git a/drivers/scsi/libsas/sas_scsi_host.c 
b/drivers/scsi/libsas/sas_scsi_host.c
index ee6b39a..87e5079 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -613,8 +613,6 @@ static void sas_eh_handle_sas_errors(struct Scsi_Host 
*shost, struct list_head *
SAS_DPRINTK("trying to find task 0x%p\n", task);
res = sas_scsi_find_task(task);
 
-   cmd->eh_eflags = 0;
-
switch (res) {
case TASK_IS_DONE:
SAS_DPRINTK("%s: task 0x%p is done\n", __func__,
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index cff7d9d..4d26ff2 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -188,7 +188,6 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
*shost)
/*
 * Retry after abort failed, escalate to next level.
 */
-   scmd->eh_eflags &= ~SCSI_EH_ABORT_SCHEDULED;
SCSI_LOG_ERROR_RECOVERY(3,
scmd_printk(KERN_INFO, scmd,
"previous abort failed\n"));
@@ -937,6 +936,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct 
scsi_eh_save *ses,
ses->result = scmd->result;
ses->underflow = scmd->underflow;
ses->prot_op = scmd->prot_op;
+   ses->eh_eflags = scmd->eh_eflags;
 
scmd->prot_op = SCSI_PROT_NORMAL;
scmd->eh_eflags = 0;
@@ -1000,6 +1000,7 @@ void scsi_eh_restore_cmnd(struct scsi_cmnd* scmd, struct 
scsi_eh_save *ses)
scmd->result = ses->result;
scmd->underflow = ses->underflow;
scmd->prot_op = ses->prot_op;
+   scmd->eh_eflags = ses->eh_eflags;
 }
 EXPORT_SYMBOL(scsi_eh_restore_cmnd);
 
@@ -1132,7 +1133,6 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn)
  */
 void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
 {
-   scmd->eh_eflags = 0;
list_move_tail(>eh_entry, done_q);
 }
 EXPORT_SYMBOL(scsi_eh_finish_cmd);
diff --git a/include/scsi/scsi_eh.h b/include/scsi/scsi_eh.h
index 98d366b..a25b328 100644
--- a/include/scsi/scsi_eh.h
+++ b/include/scsi/scsi_eh.h
@@ -31,6 +31,7 @@ extern int scsi_get_sense_info_fld(const u8 * sense_buffer, 
int sb_len,
 struct scsi_eh_save {
/* saved state */
int result;
+   int eh_eflags;
enum dma_data_direction data_direction;
unsigned underflow;
unsigned char cmd_len;
-- 
1.8.5.6

[PATCHv5 2/8] sd: Return SUCCESS in sd_eh_action() after device offline

2017-04-04 Thread Hannes Reinecke

If sd_eh_action() decides to take the device offline there is
no point in returning FAILED, as taking the device offline
is the ultimate step in SCSI EH anyway.
So further escalation via SCSI EH is not likely to make a
difference and we can as well return SUCCESS.

Cc: Benjamin Block 
Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/sd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index bd2a38e..00b168b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1751,7 +1751,7 @@ static int sd_eh_action(struct scsi_cmnd *scmd, int 
eh_disp)
"Medium access timeout failure. Offlining disk!\n");
scsi_device_set_state(scmd->device, SDEV_OFFLINE);
 
-   return FAILED;
+   return SUCCESS;
}
 
return eh_disp;
-- 
1.8.5.6

[PATCHv5 1/8] scsi_error: count medium access timeout only once per EH run

2017-04-04 Thread Hannes Reinecke

The current medium access timeout counter will be increased for
each command, so if there are enough failed commands we'll hit
the medium access timeout for even a single device failure and
the following kernel message is displayed:

sd H:C:T:L: [sdXY] Medium access timeout failure. Offlining disk!

Fix this by making the timeout per EH run, ie the counter will
only be increased once per device and EH run.

Fixes: 18a4d0a ("[SCSI] Handle disk devices which can not process medium access 
commands")
Cc: Ewan Milne 
Cc: Lawrence Obermann 
Cc: Benjamin Block 
Cc: Steffen Maier 
Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_error.c  | 18 ++
 drivers/scsi/sd.c  | 27 ++-
 drivers/scsi/sd.h  |  1 +
 include/scsi/scsi_driver.h |  1 +
 4 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index f2cafae..370f6c0 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -221,6 +221,23 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
*shost)
 }
 
 /**
+ * scsi_eh_reset - call into ->eh_action to reset internal counters
+ * @scmd:  scmd to run eh on.
+ *
+ * The scsi driver might be carrying internal state about the
+ * devices, so we need to call into the driver to reset the
+ * internal state once the error handler is started.
+ */
+static void scsi_eh_reset(struct scsi_cmnd *scmd)
+{
+   if (!blk_rq_is_passthrough(scmd->request)) {
+   struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
+   if (sdrv->eh_reset)
+   sdrv->eh_reset(scmd);
+   }
+}
+
+/**
  * scsi_eh_scmd_add - add scsi cmd to error handling.
  * @scmd:  scmd to run eh on.
  * @eh_flag:   optional SCSI_EH flag.
@@ -249,6 +266,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
eh_flag &= ~SCSI_EH_CANCEL_CMD;
scmd->eh_eflags |= eh_flag;
+   scsi_eh_reset(scmd);
list_add_tail(>eh_entry, >eh_cmd_q);
shost->host_failed++;
scsi_eh_wakeup(shost);
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d277e86..bd2a38e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -115,6 +115,7 @@
 static int sd_init_command(struct scsi_cmnd *SCpnt);
 static void sd_uninit_command(struct scsi_cmnd *SCpnt);
 static int sd_done(struct scsi_cmnd *);
+static void sd_eh_reset(struct scsi_cmnd *);
 static int sd_eh_action(struct scsi_cmnd *, int);
 static void sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer);
 static void scsi_disk_release(struct device *cdev);
@@ -532,6 +533,7 @@ static void sd_set_flush_flag(struct scsi_disk *sdkp)
.uninit_command = sd_uninit_command,
.done   = sd_done,
.eh_action  = sd_eh_action,
+   .eh_reset   = sd_eh_reset,
 };
 
 /*
@@ -1686,6 +1688,26 @@ static int sd_pr_clear(struct block_device *bdev, u64 
key)
 };
 
 /**
+ * sd_eh_reset - reset error handling callback
+ * @scmd:  sd-issued command that has failed
+ *
+ * This function is called by the SCSI midlayer before starting
+ * SCSI EH. When counting medium access failures we have to be
+ * careful to register it only only once per device and SCSI EH run;
+ * there might be several timed out commands which will cause the
+ * 'max_medium_access_timeouts' counter to trigger after the first
+ * SCSI EH run already and set the device to offline.
+ * So this function resets the internal counter before starting SCSI EH.
+ **/
+static void sd_eh_reset(struct scsi_cmnd *scmd)
+{
+   struct scsi_disk *sdkp = scsi_disk(scmd->request->rq_disk);
+
+   /* New SCSI EH run, reset gate variable */
+   sdkp->ignore_medium_access_errors = false;
+}
+
+/**
  * sd_eh_action - error handling callback
  * @scmd:  sd-issued command that has failed
  * @eh_disp:   The recovery disposition suggested by the midlayer
@@ -1714,7 +1736,10 @@ static int sd_eh_action(struct scsi_cmnd *scmd, int 
eh_disp)
 * process of recovering or has it suffered an internal failure
 * that prevents access to the storage medium.
 */
-   sdkp->medium_access_timed_out++;
+   if (!sdkp->ignore_medium_access_errors) {
+   sdkp->medium_access_timed_out++;
+   sdkp->ignore_medium_access_errors = true;
+   }
 
/*
 * If the device keeps failing read/write commands but TEST UNIT
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 4dac35e..0cf9680 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -106,6 +106,7 @@ struct scsi_disk {
unsignedrc_basis: 2;
unsignedzoned: 2;
unsignedurswrz : 1;
+   unsigned

[PATCHv5 6/8] scsi: make scsi_eh_scmd_add() always succeed

2017-04-04 Thread Hannes Reinecke

scsi_eh_scmd_add() currently only will fail if no
error handler thread is started (which will never be the
case) or if the state machine encounters an illegal transition.

But if we're encountering an invalid state transition
chances is we cannot fixup things with the error handler.
So better add a WARN_ON for illegal host states and
make scsi_dh_scmd_add() a void function.

Signed-off-by: Hannes Reinecke 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Bart Van Assche 
---
 drivers/scsi/scsi_error.c | 41 +
 drivers/scsi/scsi_lib.c   |  4 ++--
 drivers/scsi/scsi_priv.h  |  2 +-
 3 files changed, 16 insertions(+), 31 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 4d26ff2..bbc4318 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -162,13 +162,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
*shost)
}
}
 
-   if (!scsi_eh_scmd_add(scmd, 0)) {
-   SCSI_LOG_ERROR_RECOVERY(3,
-   scmd_printk(KERN_WARNING, scmd,
-   "terminate aborted command\n"));
-   set_host_byte(scmd, DID_TIME_OUT);
-   scsi_finish_command(scmd);
-   }
+   scsi_eh_scmd_add(scmd, 0);
 }
 
 /**
@@ -228,28 +222,23 @@ static void scsi_eh_reset(struct scsi_cmnd *scmd)
  * scsi_eh_scmd_add - add scsi cmd to error handling.
  * @scmd:  scmd to run eh on.
  * @eh_flag:   optional SCSI_EH flag.
- *
- * Return value:
- * 0 on failure.
  */
-int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
+void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
 {
struct Scsi_Host *shost = scmd->device->host;
unsigned long flags;
-   int ret = 0;
+   int ret;
 
-   if (!shost->ehandler)
-   return 0;
+   WARN_ON_ONCE(!shost->ehandler);
 
spin_lock_irqsave(shost->host_lock, flags);
-   if (scsi_host_set_state(shost, SHOST_RECOVERY))
-   if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY))
-   goto out_unlock;
-
+   if (scsi_host_set_state(shost, SHOST_RECOVERY)) {
+   ret = scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY);
+   WARN_ON_ONCE(ret);
+   }
if (shost->eh_deadline != -1 && !shost->last_reset)
shost->last_reset = jiffies;
 
-   ret = 1;
if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
eh_flag &= ~SCSI_EH_CANCEL_CMD;
scmd->eh_eflags |= eh_flag;
@@ -257,9 +246,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
list_add_tail(>eh_entry, >eh_cmd_q);
shost->host_failed++;
scsi_eh_wakeup(shost);
- out_unlock:
spin_unlock_irqrestore(shost->host_lock, flags);
-   return ret;
 }
 
 /**
@@ -288,13 +275,11 @@ enum blk_eh_timer_return scsi_times_out(struct request 
*req)
rtn = host->hostt->eh_timed_out(scmd);
 
if (rtn == BLK_EH_NOT_HANDLED) {
-   if (!host->hostt->no_async_abort &&
-   scsi_abort_command(scmd) == SUCCESS)
-   return BLK_EH_NOT_HANDLED;
-
-   set_host_byte(scmd, DID_TIME_OUT);
-   if (!scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD))
-   rtn = BLK_EH_HANDLED;
+   if (host->hostt->no_async_abort ||
+   scsi_abort_command(scmd) != SUCCESS) {
+   set_host_byte(scmd, DID_TIME_OUT);
+   scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
+   }
}
 
return rtn;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ba22866..ba8da90 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1593,8 +1593,8 @@ static void scsi_softirq_done(struct request *rq)
scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
break;
default:
-   if (!scsi_eh_scmd_add(cmd, 0))
-   scsi_finish_command(cmd);
+   scsi_eh_scmd_add(cmd, 0);
+   break;
}
 }
 
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 99bfc98..5be6cbf6 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -72,7 +72,7 @@ extern int scsi_dev_info_list_add_keyed(int compatible, char 
*vendor,
 extern int scsi_error_handler(void *host);
 extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
 extern void scsi_eh_wakeup(struct Scsi_Host *shost);
-extern int scsi_eh_scmd_add(struct scsi_cmnd *, int);
+extern void scsi_eh_scmd_add(struct scsi_cmnd *, int);
 void scsi_eh_ready_devs(struct Scsi_Host *shost,
struct list_head *work_q,
struct list_head *done_q);
-- 
1.8.5.6

[PATCHv5 8/8] scsi: inline command aborts

2017-04-04 Thread Hannes Reinecke

The block layer always calls the timeout function from a workqueue
context, so there is no need to have yet another workqueue for
running command aborts.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi.c   |  2 --
 drivers/scsi/scsi_error.c | 80 +++
 drivers/scsi/scsi_lib.c   |  2 --
 drivers/scsi/scsi_priv.h  |  1 -
 4 files changed, 39 insertions(+), 46 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 7bfbcfa..fdec73e 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -115,8 +115,6 @@ void scsi_put_command(struct scsi_cmnd *cmd)
BUG_ON(list_empty(>list));
list_del_init(>list);
spin_unlock_irqrestore(>device->list_lock, flags);
-
-   BUG_ON(delayed_work_pending(>abort_work));
 }
 
 #ifdef CONFIG_SCSI_LOGGING
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 53e3343..2355100 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -115,11 +115,9 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
*shost)
  * scmd_eh_abort_handler - Handle command aborts
  * @work:  command to be aborted.
  */
-void
-scmd_eh_abort_handler(struct work_struct *work)
+int
+scmd_eh_abort_handler(struct scsi_cmnd *scmd)
 {
-   struct scsi_cmnd *scmd =
-   container_of(work, struct scsi_cmnd, abort_work.work);
struct scsi_device *sdev = scmd->device;
int rtn;
 
@@ -127,42 +125,40 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
*shost)
SCSI_LOG_ERROR_RECOVERY(3,
scmd_printk(KERN_INFO, scmd,
"eh timeout, not aborting\n"));
-   } else {
-   SCSI_LOG_ERROR_RECOVERY(3,
-   scmd_printk(KERN_INFO, scmd,
-   "aborting command\n"));
-   rtn = scsi_try_to_abort_cmd(sdev->host->hostt, scmd);
-   if (rtn == SUCCESS) {
-   set_host_byte(scmd, DID_TIME_OUT);
-   if (scsi_host_eh_past_deadline(sdev->host)) {
-   SCSI_LOG_ERROR_RECOVERY(3,
-   scmd_printk(KERN_INFO, scmd,
-   "eh timeout, not retrying "
-   "aborted command\n"));
-   } else if (!scsi_noretry_cmd(scmd) &&
-   (++scmd->retries <= scmd->allowed)) {
-   SCSI_LOG_ERROR_RECOVERY(3,
-   scmd_printk(KERN_WARNING, scmd,
-   "retry aborted command\n"));
-   scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
-   return;
-   } else {
-   SCSI_LOG_ERROR_RECOVERY(3,
-   scmd_printk(KERN_WARNING, scmd,
-   "finish aborted 
command\n"));
-   scsi_finish_command(scmd);
-   return;
-   }
-   } else {
+   return FAILED;
+   }
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_INFO, scmd,
+   "aborting command\n"));
+   rtn = scsi_try_to_abort_cmd(sdev->host->hostt, scmd);
+   if (rtn == SUCCESS) {
+   set_host_byte(scmd, DID_TIME_OUT);
+   if (scsi_host_eh_past_deadline(sdev->host)) {
SCSI_LOG_ERROR_RECOVERY(3,
scmd_printk(KERN_INFO, scmd,
-   "cmd abort %s\n",
-   (rtn == FAST_IO_FAIL) ?
-   "not send" : "failed"));
+   "eh timeout, not retrying "
+   "aborted command\n"));
+   rtn = FAILED;
+   } else if (!scsi_noretry_cmd(scmd) &&
+  (++scmd->retries <= scmd->allowed)) {
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_WARNING, scmd,
+   "retry aborted command\n"));
+   scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
+   } else {
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_WARNING, scmd,
+   "finish aborted command\n"));
+   scsi_finish_command(scmd);
}
+   return rtn;
}
-
-   scsi_eh_scmd_add(scmd);
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_INFO, scmd,
+   "cmd abort

[PATCHv5 4/8] libsas: allow async aborts

2017-04-04 Thread Hannes Reinecke

From: Christoph Hellwig 

We now first try to call ->eh_abort_handler from a work queue, but libsas
was always failing that for no good reason.  Allow async aborts.

Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/libsas/sas_scsi_host.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c 
b/drivers/scsi/libsas/sas_scsi_host.c
index 9bd55bc..ee6b39a 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -491,9 +491,6 @@ int sas_eh_abort_handler(struct scsi_cmnd *cmd)
struct Scsi_Host *host = cmd->device->host;
struct sas_internal *i = to_sas_internal(host->transportt);
 
-   if (current != host->ehandler)
-   return FAILED;
-
if (!i->dft->lldd_abort_task)
return FAILED;
 
-- 
1.8.5.6

[PATCHv5 7/8] scsi: make asynchronous aborts mandatory

2017-04-04 Thread Hannes Reinecke

There hasn't been any reports for HBAs where asynchronous abort
would not work, so we should make it mandatory and remove
the fallback.

Signed-off-by: Hannes Reinecke 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Bart Van Assche 
---
 Documentation/scsi/scsi_eh.txt | 18 --
 drivers/scsi/scsi_error.c  | 81 --
 drivers/scsi/scsi_lib.c|  2 +-
 drivers/scsi/scsi_priv.h   |  3 +-
 include/scsi/scsi_host.h   |  5 ---
 5 files changed, 15 insertions(+), 94 deletions(-)

diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
index 4edb9c1c..11e447b 100644
--- a/Documentation/scsi/scsi_eh.txt
+++ b/Documentation/scsi/scsi_eh.txt
@@ -70,7 +70,7 @@ with the command.
scmd is requeued to blk queue.
 
  - otherwise
-   scsi_eh_scmd_add(scmd, 0) is invoked for the command.  See
+   scsi_eh_scmd_add(scmd) is invoked for the command.  See
[1-3] for details of this function.
 
 
@@ -103,9 +103,7 @@ function
 eh_timed_out() callback did not handle the command.
Step #2 is taken.
 
- 2. If the host supports asynchronous completion (as indicated by the
-no_async_abort setting in the host template) scsi_abort_command()
-is invoked to schedule an asynchrous abort.
+ 2. scsi_abort_command() is invoked to schedule an asynchrous abort.
 Asynchronous abort are not invoked for commands which the
 SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
 already had been aborted once, and this is a retry which failed),
@@ -127,16 +125,13 @@ function
 
  scmds enter EH via scsi_eh_scmd_add(), which does the following.
 
- 1. Turns on scmd->eh_eflags as requested.  It's 0 for error
-completions and SCSI_EH_CANCEL_CMD for timeouts.
+ 1. Links scmd->eh_entry to shost->eh_cmd_q
 
- 2. Links scmd->eh_entry to shost->eh_cmd_q
+ 2. Sets SHOST_RECOVERY bit in shost->shost_state
 
- 3. Sets SHOST_RECOVERY bit in shost->shost_state
+ 3. Increments shost->host_failed
 
- 4. Increments shost->host_failed
-
- 5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
+ 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
 
  As can be seen above, once any scmd is added to shost->eh_cmd_q,
 SHOST_RECOVERY shost_state bit is turned on.  This prevents any new
@@ -252,7 +247,6 @@ scmd->allowed.
 
  1. Error completion / time out
 ACTION: scsi_eh_scmd_add() is invoked for scmd
-   - set scmd->eh_eflags
- add scmd to shost->eh_cmd_q
- set SHOST_RECOVERY
- shost->host_failed++
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index bbc4318..53e3343 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -162,7 +162,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
*shost)
}
}
 
-   scsi_eh_scmd_add(scmd, 0);
+   scsi_eh_scmd_add(scmd);
 }
 
 /**
@@ -221,9 +221,8 @@ static void scsi_eh_reset(struct scsi_cmnd *scmd)
 /**
  * scsi_eh_scmd_add - add scsi cmd to error handling.
  * @scmd:  scmd to run eh on.
- * @eh_flag:   optional SCSI_EH flag.
  */
-void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
+void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
 {
struct Scsi_Host *shost = scmd->device->host;
unsigned long flags;
@@ -239,9 +238,6 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
if (shost->eh_deadline != -1 && !shost->last_reset)
shost->last_reset = jiffies;
 
-   if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
-   eh_flag &= ~SCSI_EH_CANCEL_CMD;
-   scmd->eh_eflags |= eh_flag;
scsi_eh_reset(scmd);
list_add_tail(>eh_entry, >eh_cmd_q);
shost->host_failed++;
@@ -275,10 +271,9 @@ enum blk_eh_timer_return scsi_times_out(struct request 
*req)
rtn = host->hostt->eh_timed_out(scmd);
 
if (rtn == BLK_EH_NOT_HANDLED) {
-   if (host->hostt->no_async_abort ||
-   scsi_abort_command(scmd) != SUCCESS) {
+   if (scsi_abort_command(scmd) != SUCCESS) {
set_host_byte(scmd, DID_TIME_OUT);
-   scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
+   scsi_eh_scmd_add(scmd);
}
}
 
@@ -331,7 +326,7 @@ static inline void scsi_eh_prt_fail_stats(struct Scsi_Host 
*shost,
list_for_each_entry(scmd, work_q, eh_entry) {
if (scmd->device == sdev) {
++total_failures;
-   if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD)
+   if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
++cmd_cancel;
else
++cmd_failed;
@@ -1154,8 +1149,7 @@ int scsi_eh_get_sense(struct

[PATCHv5 0/8] SCSI EH cleanup

2017-04-04 Thread Hannes Reinecke

Hi all,

this is a resend of a small patchset for cleaning up SCSI EH.
Primary goal is to make asynchronous aborts mandatory; there hasn't
been a single report so far where asynchronous abort won't work, so
the 'no_async_abort' flag has never been used and will be removed
with this patchset.
Additionally there's a cleanup to detect retries of failed commands and
to inline commands aborts.
I've also included the patch to correctly reset the media access
timeout counter to avoid it being triggered inadvertedly by several
commands timing out for the same device.

As usual, comments and reviews are welcome.

Changes to v1:
- Include reviews from Christoph

Changes to v2:
- Include reviews from Bart
- Add medium access timeout patch

Changes to v3:
- Add patch to inline command abort as per suggestion from Christoph
- Drop patch to not escalate failed EH commands

Changes to v4:
- Split off sdrv->eh_action into two functions

Christoph Hellwig (1):
  libsas: allow async aborts

Hannes Reinecke (7):
  scsi_error: count medium access timeout only once per EH run
  sd: Return SUCCESS in sd_eh_action() after device offline
  scsi: always send command aborts
  scsi: make eh_eflags persistent
  scsi: make scsi_eh_scmd_add() always succeed
  scsi: make asynchronous aborts mandatory
  scsi: inline command aborts

 Documentation/scsi/scsi_eh.txt  |  30 ++---
 drivers/scsi/libsas/sas_scsi_host.c |   5 -
 drivers/scsi/scsi.c |   2 -
 drivers/scsi/scsi_error.c   | 218 
 drivers/scsi/scsi_lib.c |   6 +-
 drivers/scsi/scsi_priv.h|   4 +-
 drivers/scsi/sd.c   |  29 -
 drivers/scsi/sd.h   |   1 +
 include/scsi/scsi_driver.h  |   1 +
 include/scsi/scsi_eh.h  |   1 +
 include/scsi/scsi_host.h|   5 -
 11 files changed, 115 insertions(+), 187 deletions(-)

-- 
1.8.5.6

[PATCHv5 3/8] scsi: always send command aborts

2017-04-04 Thread Hannes Reinecke

When a command has timed out we always should be sending an
abort; with the previous code a failed abort might signal
SCSI EH to start, and all other timed out commands will
never be aborted, even though they might belong to a
different ITL nexus.

Cc: Benjamin Block 
Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_error.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 370f6c0..cff7d9d 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -196,19 +196,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
*shost)
return FAILED;
}
 
-   /*
-* Do not try a command abort if
-* SCSI EH has already started.
-*/
spin_lock_irqsave(shost->host_lock, flags);
-   if (scsi_host_in_recovery(shost)) {
-   spin_unlock_irqrestore(shost->host_lock, flags);
-   SCSI_LOG_ERROR_RECOVERY(3,
-   scmd_printk(KERN_INFO, scmd,
-   "not aborting, host in recovery\n"));
-   return FAILED;
-   }
-
if (shost->eh_deadline != -1 && !shost->last_reset)
shost->last_reset = jiffies;
spin_unlock_irqrestore(shost->host_lock, flags);
-- 
1.8.5.6

Re: [PATCHv4 1/8] scsi_error: count medium access timeout only once per EH run

2017-04-04 Thread Christoph Hellwig

NAK - don't overload ->eh_action for two entirely different purposes.

[PATCHv4 6/8] scsi: make scsi_eh_scmd_add() always succeed

2017-04-04 Thread Hannes Reinecke

scsi_eh_scmd_add() currently only will fail if no
error handler thread is started (which will never be the
case) or if the state machine encounters an illegal transition.

But if we're encountering an invalid state transition
chances is we cannot fixup things with the error handler.
So better add a WARN_ON for illegal host states and
make scsi_dh_scmd_add() a void function.

Signed-off-by: Hannes Reinecke 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Bart Van Assche 
---
 drivers/scsi/scsi_error.c | 41 +
 drivers/scsi/scsi_lib.c   |  4 ++--
 drivers/scsi/scsi_priv.h  |  2 +-
 3 files changed, 16 insertions(+), 31 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index caacf69..ba9c9bd 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -162,13 +162,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
*shost)
}
}
 
-   if (!scsi_eh_scmd_add(scmd, 0)) {
-   SCSI_LOG_ERROR_RECOVERY(3,
-   scmd_printk(KERN_WARNING, scmd,
-   "terminate aborted command\n"));
-   set_host_byte(scmd, DID_TIME_OUT);
-   scsi_finish_command(scmd);
-   }
+   scsi_eh_scmd_add(scmd, 0);
 }
 
 /**
@@ -231,28 +225,23 @@ static int scsi_eh_reset(struct scsi_cmnd *scmd)
  * scsi_eh_scmd_add - add scsi cmd to error handling.
  * @scmd:  scmd to run eh on.
  * @eh_flag:   optional SCSI_EH flag.
- *
- * Return value:
- * 0 on failure.
  */
-int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
+void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
 {
struct Scsi_Host *shost = scmd->device->host;
unsigned long flags;
-   int ret = 0;
+   int ret;
 
-   if (!shost->ehandler)
-   return 0;
+   WARN_ON_ONCE(!shost->ehandler);
 
spin_lock_irqsave(shost->host_lock, flags);
-   if (scsi_host_set_state(shost, SHOST_RECOVERY))
-   if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY))
-   goto out_unlock;
-
+   if (scsi_host_set_state(shost, SHOST_RECOVERY)) {
+   ret = scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY);
+   WARN_ON_ONCE(ret);
+   }
if (shost->eh_deadline != -1 && !shost->last_reset)
shost->last_reset = jiffies;
 
-   ret = 1;
if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
eh_flag &= ~SCSI_EH_CANCEL_CMD;
scmd->eh_eflags |= eh_flag;
@@ -260,9 +249,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
list_add_tail(>eh_entry, >eh_cmd_q);
shost->host_failed++;
scsi_eh_wakeup(shost);
- out_unlock:
spin_unlock_irqrestore(shost->host_lock, flags);
-   return ret;
 }
 
 /**
@@ -291,13 +278,11 @@ enum blk_eh_timer_return scsi_times_out(struct request 
*req)
rtn = host->hostt->eh_timed_out(scmd);
 
if (rtn == BLK_EH_NOT_HANDLED) {
-   if (!host->hostt->no_async_abort &&
-   scsi_abort_command(scmd) == SUCCESS)
-   return BLK_EH_NOT_HANDLED;
-
-   set_host_byte(scmd, DID_TIME_OUT);
-   if (!scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD))
-   rtn = BLK_EH_HANDLED;
+   if (host->hostt->no_async_abort ||
+   scsi_abort_command(scmd) != SUCCESS) {
+   set_host_byte(scmd, DID_TIME_OUT);
+   scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
+   }
}
 
return rtn;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ba22866..ba8da90 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1593,8 +1593,8 @@ static void scsi_softirq_done(struct request *rq)
scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
break;
default:
-   if (!scsi_eh_scmd_add(cmd, 0))
-   scsi_finish_command(cmd);
+   scsi_eh_scmd_add(cmd, 0);
+   break;
}
 }
 
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 99bfc98..5be6cbf6 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -72,7 +72,7 @@ extern int scsi_dev_info_list_add_keyed(int compatible, char 
*vendor,
 extern int scsi_error_handler(void *host);
 extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
 extern void scsi_eh_wakeup(struct Scsi_Host *shost);
-extern int scsi_eh_scmd_add(struct scsi_cmnd *, int);
+extern void scsi_eh_scmd_add(struct scsi_cmnd *, int);
 void scsi_eh_ready_devs(struct Scsi_Host *shost,
struct list_head *work_q,
struct list_head *done_q);
-- 
1.8.5.6

[PATCHv4 8/8] scsi: inline command aborts

2017-04-04 Thread Hannes Reinecke

The block layer always calls the timeout function from a workqueue
context, so there is no need to have yet another workqueue for
running command aborts.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi.c   |  2 --
 drivers/scsi/scsi_error.c | 80 +++
 drivers/scsi/scsi_lib.c   |  2 --
 drivers/scsi/scsi_priv.h  |  1 -
 4 files changed, 39 insertions(+), 46 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 7bfbcfa..fdec73e 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -115,8 +115,6 @@ void scsi_put_command(struct scsi_cmnd *cmd)
BUG_ON(list_empty(>list));
list_del_init(>list);
spin_unlock_irqrestore(>device->list_lock, flags);
-
-   BUG_ON(delayed_work_pending(>abort_work));
 }
 
 #ifdef CONFIG_SCSI_LOGGING
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c09e81f..7165bcd 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -115,11 +115,9 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
*shost)
  * scmd_eh_abort_handler - Handle command aborts
  * @work:  command to be aborted.
  */
-void
-scmd_eh_abort_handler(struct work_struct *work)
+int
+scmd_eh_abort_handler(struct scsi_cmnd *scmd)
 {
-   struct scsi_cmnd *scmd =
-   container_of(work, struct scsi_cmnd, abort_work.work);
struct scsi_device *sdev = scmd->device;
int rtn;
 
@@ -127,42 +125,40 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
*shost)
SCSI_LOG_ERROR_RECOVERY(3,
scmd_printk(KERN_INFO, scmd,
"eh timeout, not aborting\n"));
-   } else {
-   SCSI_LOG_ERROR_RECOVERY(3,
-   scmd_printk(KERN_INFO, scmd,
-   "aborting command\n"));
-   rtn = scsi_try_to_abort_cmd(sdev->host->hostt, scmd);
-   if (rtn == SUCCESS) {
-   set_host_byte(scmd, DID_TIME_OUT);
-   if (scsi_host_eh_past_deadline(sdev->host)) {
-   SCSI_LOG_ERROR_RECOVERY(3,
-   scmd_printk(KERN_INFO, scmd,
-   "eh timeout, not retrying "
-   "aborted command\n"));
-   } else if (!scsi_noretry_cmd(scmd) &&
-   (++scmd->retries <= scmd->allowed)) {
-   SCSI_LOG_ERROR_RECOVERY(3,
-   scmd_printk(KERN_WARNING, scmd,
-   "retry aborted command\n"));
-   scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
-   return;
-   } else {
-   SCSI_LOG_ERROR_RECOVERY(3,
-   scmd_printk(KERN_WARNING, scmd,
-   "finish aborted 
command\n"));
-   scsi_finish_command(scmd);
-   return;
-   }
-   } else {
+   return FAILED;
+   }
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_INFO, scmd,
+   "aborting command\n"));
+   rtn = scsi_try_to_abort_cmd(sdev->host->hostt, scmd);
+   if (rtn == SUCCESS) {
+   set_host_byte(scmd, DID_TIME_OUT);
+   if (scsi_host_eh_past_deadline(sdev->host)) {
SCSI_LOG_ERROR_RECOVERY(3,
scmd_printk(KERN_INFO, scmd,
-   "cmd abort %s\n",
-   (rtn == FAST_IO_FAIL) ?
-   "not send" : "failed"));
+   "eh timeout, not retrying "
+   "aborted command\n"));
+   rtn = FAILED;
+   } else if (!scsi_noretry_cmd(scmd) &&
+  (++scmd->retries <= scmd->allowed)) {
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_WARNING, scmd,
+   "retry aborted command\n"));
+   scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
+   } else {
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_WARNING, scmd,
+   "finish aborted command\n"));
+   scsi_finish_command(scmd);
}
+   return rtn;
}
-
-   scsi_eh_scmd_add(scmd);
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_INFO, scmd,
+   "cmd abort

[PATCHv4 4/8] libsas: allow async aborts

2017-04-04 Thread Hannes Reinecke

From: Christoph Hellwig 

We now first try to call ->eh_abort_handler from a work queue, but libsas
was always failing that for no good reason.  Allow async aborts.

Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/libsas/sas_scsi_host.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c 
b/drivers/scsi/libsas/sas_scsi_host.c
index 9bd55bc..ee6b39a 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -491,9 +491,6 @@ int sas_eh_abort_handler(struct scsi_cmnd *cmd)
struct Scsi_Host *host = cmd->device->host;
struct sas_internal *i = to_sas_internal(host->transportt);
 
-   if (current != host->ehandler)
-   return FAILED;
-
if (!i->dft->lldd_abort_task)
return FAILED;
 
-- 
1.8.5.6

[PATCHv4 5/8] scsi: make eh_eflags persistent

2017-04-04 Thread Hannes Reinecke

If a failed command is retried and fails again we need
to enter SCSI EH, otherwise we will never be able to
recover the command.
To detect this situation we must not clear scmd->eh_eflags
when EH finishes but rather make it persistent throughout
the lifetime of the command.

Reviewed-by: Benjamin Block 
Reviewed-by: Bart Van Assche 
Reviewed-by: Johannes Thumshirn 
Signed-off-by: Hannes Reinecke 
---
 Documentation/scsi/scsi_eh.txt  | 14 +++---
 drivers/scsi/libsas/sas_scsi_host.c |  2 --
 drivers/scsi/scsi_error.c   |  4 ++--
 include/scsi/scsi_eh.h  |  1 +
 4 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
index 37eca00..4edb9c1c 100644
--- a/Documentation/scsi/scsi_eh.txt
+++ b/Documentation/scsi/scsi_eh.txt
@@ -105,11 +105,14 @@ function
 
  2. If the host supports asynchronous completion (as indicated by the
 no_async_abort setting in the host template) scsi_abort_command()
-is invoked to schedule an asynchrous abort. If that fails
-Step #3 is taken.
+is invoked to schedule an asynchrous abort.
+Asynchronous abort are not invoked for commands which the
+SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
+already had been aborted once, and this is a retry which failed),
+or when the EH deadline is expired. In these case Step #3 is taken.
 
- 2. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
-command.  See [1-3] for more information.
+ 3. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
+command.  See [1-4] for more information.
 
 [1-3] Asynchronous command aborts
 
@@ -263,7 +266,6 @@ scmd->allowed.
 
  3. scmd recovered
 ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd
-   - clear scmd->eh_eflags
- scsi_setup_cmd_retry()
- move from local eh_work_q to local eh_done_q
 LOCKING: none
@@ -456,8 +458,6 @@ except for #1 must be implemented by eh_strategy_handler().
 
  - shost->host_failed is zero.
 
- - Each scmd's eh_eflags field is cleared.
-
  - Each scmd is in such a state that scsi_setup_cmd_retry() on the
scmd doesn't make any difference.
 
diff --git a/drivers/scsi/libsas/sas_scsi_host.c 
b/drivers/scsi/libsas/sas_scsi_host.c
index ee6b39a..87e5079 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -613,8 +613,6 @@ static void sas_eh_handle_sas_errors(struct Scsi_Host 
*shost, struct list_head *
SAS_DPRINTK("trying to find task 0x%p\n", task);
res = sas_scsi_find_task(task);
 
-   cmd->eh_eflags = 0;
-
switch (res) {
case TASK_IS_DONE:
SAS_DPRINTK("%s: task 0x%p is done\n", __func__,
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 615e6c0..caacf69 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -188,7 +188,6 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
*shost)
/*
 * Retry after abort failed, escalate to next level.
 */
-   scmd->eh_eflags &= ~SCSI_EH_ABORT_SCHEDULED;
SCSI_LOG_ERROR_RECOVERY(3,
scmd_printk(KERN_INFO, scmd,
"previous abort failed\n"));
@@ -940,6 +939,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct 
scsi_eh_save *ses,
ses->result = scmd->result;
ses->underflow = scmd->underflow;
ses->prot_op = scmd->prot_op;
+   ses->eh_eflags = scmd->eh_eflags;
 
scmd->prot_op = SCSI_PROT_NORMAL;
scmd->eh_eflags = 0;
@@ -1003,6 +1003,7 @@ void scsi_eh_restore_cmnd(struct scsi_cmnd* scmd, struct 
scsi_eh_save *ses)
scmd->result = ses->result;
scmd->underflow = ses->underflow;
scmd->prot_op = ses->prot_op;
+   scmd->eh_eflags = ses->eh_eflags;
 }
 EXPORT_SYMBOL(scsi_eh_restore_cmnd);
 
@@ -1135,7 +1136,6 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn)
  */
 void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
 {
-   scmd->eh_eflags = 0;
list_move_tail(>eh_entry, done_q);
 }
 EXPORT_SYMBOL(scsi_eh_finish_cmd);
diff --git a/include/scsi/scsi_eh.h b/include/scsi/scsi_eh.h
index 98d366b..a25b328 100644
--- a/include/scsi/scsi_eh.h
+++ b/include/scsi/scsi_eh.h
@@ -31,6 +31,7 @@ extern int scsi_get_sense_info_fld(const u8 * sense_buffer, 
int sb_len,
 struct scsi_eh_save {
/* saved state */
int result;
+   int eh_eflags;
enum dma_data_direction data_direction;
unsigned underflow;
unsigned char cmd_len;
-- 
1.8.5.6

[PATCHv4 0/8]

2017-04-04 Thread Hannes Reinecke

Hi all,

this is a resend of a small patchset for cleaning up SCSI EH.
Primary goal is to make asynchronous aborts mandatory; there hasn't
been a single report so far where asynchronous abort won't work, so
the 'no_async_abort' flag has never been used and will be removed
with this patchset.
Additionally there's a cleanup to detect retries of failed commands and
to inline commands aborts.
I've also included the patch to correctly reset the media access
timeout counter to avoid it being triggered inadvertedly by several
commands timing out for the same device.

As usual, comments and reviews are welcome.

Changes to v1:
- Include reviews from Christoph

Changes to v2:
- Include reviews from Bart
- Add medium access timeout patch

Changes to v3:
- Add patch to inline command abort as per suggestion from Christoph
- Drop patch to not escalate failed EH commands

Christoph Hellwig (1):
  libsas: allow async aborts

Hannes Reinecke (7):
  scsi_error: count medium access timeout only once per EH run
  sd: Return SUCCESS in sd_eh_action() after device offline
  scsi: always send command aborts
  scsi: make eh_eflags persistent
  scsi: make scsi_eh_scmd_add() always succeed
  scsi: make asynchronous aborts mandatory
  scsi: inline command aborts

 Documentation/scsi/scsi_eh.txt  |  30 ++---
 drivers/scsi/libsas/sas_scsi_host.c |   5 -
 drivers/scsi/scsi.c |   2 -
 drivers/scsi/scsi_error.c   | 223 
 drivers/scsi/scsi_lib.c |   6 +-
 drivers/scsi/scsi_priv.h|   4 +-
 drivers/scsi/sd.c   |  24 +++-
 drivers/scsi/sd.h   |   1 +
 include/scsi/scsi_driver.h  |   2 +-
 include/scsi/scsi_eh.h  |   1 +
 include/scsi/scsi_host.h|   5 -
 11 files changed, 111 insertions(+), 192 deletions(-)

-- 
1.8.5.6

[PATCHv4 2/8] sd: Return SUCCESS in sd_eh_action() after device offline

2017-04-04 Thread Hannes Reinecke

If sd_eh_action() decides to take the device offline there is
no point in returning FAILED, as taking the device offline
is the ultimate step in SCSI EH anyway.
So further escalation via SCSI EH is not likely to make a
difference and we can as well return SUCCESS.

Cc: Benjamin Block 
Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/sd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 656a32d..1969e85 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1740,7 +1740,7 @@ static int sd_eh_action(struct scsi_cmnd *scmd, int 
eh_disp, bool restart)
"Medium access timeout failure. Offlining disk!\n");
scsi_device_set_state(scmd->device, SDEV_OFFLINE);
 
-   return FAILED;
+   return SUCCESS;
}
 
return eh_disp;
-- 
1.8.5.6

[PATCHv4 1/8] scsi_error: count medium access timeout only once per EH run

2017-04-04 Thread Hannes Reinecke

The current medium access timeout counter will be increased for
each command, so if there are enough failed commands we'll hit
the medium access timeout for even a single device failure and
the following kernel message is displayed:

sd H:C:T:L: [sdXY] Medium access timeout failure. Offlining disk!

Fix this by making the timeout per EH run, ie the counter will
only be increased once per device and EH run.

Fixes: 18a4d0a ("[SCSI] Handle disk devices which can not process medium access 
commands")
Cc: Ewan Milne 
Cc: Lawrence Obermann 
Cc: Benjamin Block 
Cc: Steffen Maier 
Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_error.c  | 23 ++-
 drivers/scsi/sd.c  | 22 ++
 drivers/scsi/sd.h  |  1 +
 include/scsi/scsi_driver.h |  2 +-
 4 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index f2cafae..390a6bc 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -221,6 +221,26 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
*shost)
 }
 
 /**
+ * scsi_eh_reset - call into ->eh_action to reset internal counters
+ * @scmd:  scmd to run eh on.
+ *
+ * The scsi driver might be carrying internal state about the
+ * devices, so we need to call into the driver to reset the
+ * internal state once the error handler is started.
+ */
+static int scsi_eh_reset(struct scsi_cmnd *scmd)
+{
+   int rtn = SUCCESS;
+
+   if (!blk_rq_is_passthrough(scmd->request)) {
+   struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
+   if (sdrv->eh_action)
+   rtn = sdrv->eh_action(scmd, rtn, true);
+   }
+   return rtn;
+}
+
+/**
  * scsi_eh_scmd_add - add scsi cmd to error handling.
  * @scmd:  scmd to run eh on.
  * @eh_flag:   optional SCSI_EH flag.
@@ -249,6 +269,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
eh_flag &= ~SCSI_EH_CANCEL_CMD;
scmd->eh_eflags |= eh_flag;
+   scsi_eh_reset(scmd);
list_add_tail(>eh_entry, >eh_cmd_q);
shost->host_failed++;
scsi_eh_wakeup(shost);
@@ -1107,7 +1128,7 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn)
if (!blk_rq_is_passthrough(scmd->request)) {
struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
if (sdrv->eh_action)
-   rtn = sdrv->eh_action(scmd, rtn);
+   rtn = sdrv->eh_action(scmd, rtn, false);
}
return rtn;
 }
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d277e86..656a32d 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -115,7 +115,7 @@
 static int sd_init_command(struct scsi_cmnd *SCpnt);
 static void sd_uninit_command(struct scsi_cmnd *SCpnt);
 static int sd_done(struct scsi_cmnd *);
-static int sd_eh_action(struct scsi_cmnd *, int);
+static int sd_eh_action(struct scsi_cmnd *, int, bool);
 static void sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer);
 static void scsi_disk_release(struct device *cdev);
 static void sd_print_sense_hdr(struct scsi_disk *, struct scsi_sense_hdr *);
@@ -1689,18 +1689,29 @@ static int sd_pr_clear(struct block_device *bdev, u64 
key)
  * sd_eh_action - error handling callback
  * @scmd:  sd-issued command that has failed
  * @eh_disp:   The recovery disposition suggested by the midlayer
+ * @restart:   SCSI EH has been restarted
  *
  * This function is called by the SCSI midlayer upon completion of an
  * error test command (currently TEST UNIT READY). The result of sending
  * the eh command is passed in eh_disp.  We're looking for devices that
  * fail medium access commands but are OK with non access commands like
  * test unit ready (so wrongly see the device as having a successful
- * recovery)
+ * recovery).
+ * We have to be careful to count a medium access failure only once
+ * per device and SCSI EH run; there might be several timed out
+ * commands which will cause the 'max_medium_access_timeouts' counter
+ * to trigger after the first SCSI EH run already and set the device
+ * to offline.
  **/
-static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp)
+static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp, bool restart)
 {
struct scsi_disk *sdkp = scsi_disk(scmd->request->rq_disk);
 
+   if (restart) {
+   /* New SCSI EH run, reset gate variable */
+   sdkp->ignore_medium_access_errors = false;
+   return eh_disp;
+   }
if (!scsi_device_online(scmd->device) ||
!scsi_medium_access_command(scmd) ||
host_byte(scmd->result) != DID_TIME_OUT ||
@@ -1714,7 +1725,10 @@ static int

[PATCHv4 3/8] scsi: always send command aborts

2017-04-04 Thread Hannes Reinecke

When a command has timed out we always should be sending an
abort; with the previous code a failed abort might signal
SCSI EH to start, and all other timed out commands will
never be aborted, even though they might belong to a
different ITL nexus.

Cc: Benjamin Block 
Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_error.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 390a6bc..615e6c0 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -196,19 +196,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
*shost)
return FAILED;
}
 
-   /*
-* Do not try a command abort if
-* SCSI EH has already started.
-*/
spin_lock_irqsave(shost->host_lock, flags);
-   if (scsi_host_in_recovery(shost)) {
-   spin_unlock_irqrestore(shost->host_lock, flags);
-   SCSI_LOG_ERROR_RECOVERY(3,
-   scmd_printk(KERN_INFO, scmd,
-   "not aborting, host in recovery\n"));
-   return FAILED;
-   }
-
if (shost->eh_deadline != -1 && !shost->last_reset)
shost->last_reset = jiffies;
spin_unlock_irqrestore(shost->host_lock, flags);
-- 
1.8.5.6

[PATCHv4 7/8] scsi: make asynchronous aborts mandatory

2017-04-04 Thread Hannes Reinecke

There hasn't been any reports for HBAs where asynchronous abort
would not work, so we should make it mandatory and remove
the fallback.

Signed-off-by: Hannes Reinecke 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Bart Van Assche 
---
 Documentation/scsi/scsi_eh.txt | 18 --
 drivers/scsi/scsi_error.c  | 81 --
 drivers/scsi/scsi_lib.c|  2 +-
 drivers/scsi/scsi_priv.h   |  3 +-
 include/scsi/scsi_host.h   |  5 ---
 5 files changed, 15 insertions(+), 94 deletions(-)

diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
index 4edb9c1c..11e447b 100644
--- a/Documentation/scsi/scsi_eh.txt
+++ b/Documentation/scsi/scsi_eh.txt
@@ -70,7 +70,7 @@ with the command.
scmd is requeued to blk queue.
 
  - otherwise
-   scsi_eh_scmd_add(scmd, 0) is invoked for the command.  See
+   scsi_eh_scmd_add(scmd) is invoked for the command.  See
[1-3] for details of this function.
 
 
@@ -103,9 +103,7 @@ function
 eh_timed_out() callback did not handle the command.
Step #2 is taken.
 
- 2. If the host supports asynchronous completion (as indicated by the
-no_async_abort setting in the host template) scsi_abort_command()
-is invoked to schedule an asynchrous abort.
+ 2. scsi_abort_command() is invoked to schedule an asynchrous abort.
 Asynchronous abort are not invoked for commands which the
 SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
 already had been aborted once, and this is a retry which failed),
@@ -127,16 +125,13 @@ function
 
  scmds enter EH via scsi_eh_scmd_add(), which does the following.
 
- 1. Turns on scmd->eh_eflags as requested.  It's 0 for error
-completions and SCSI_EH_CANCEL_CMD for timeouts.
+ 1. Links scmd->eh_entry to shost->eh_cmd_q
 
- 2. Links scmd->eh_entry to shost->eh_cmd_q
+ 2. Sets SHOST_RECOVERY bit in shost->shost_state
 
- 3. Sets SHOST_RECOVERY bit in shost->shost_state
+ 3. Increments shost->host_failed
 
- 4. Increments shost->host_failed
-
- 5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
+ 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
 
  As can be seen above, once any scmd is added to shost->eh_cmd_q,
 SHOST_RECOVERY shost_state bit is turned on.  This prevents any new
@@ -252,7 +247,6 @@ scmd->allowed.
 
  1. Error completion / time out
 ACTION: scsi_eh_scmd_add() is invoked for scmd
-   - set scmd->eh_eflags
- add scmd to shost->eh_cmd_q
- set SHOST_RECOVERY
- shost->host_failed++
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index ba9c9bd..c09e81f 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -162,7 +162,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
*shost)
}
}
 
-   scsi_eh_scmd_add(scmd, 0);
+   scsi_eh_scmd_add(scmd);
 }
 
 /**
@@ -224,9 +224,8 @@ static int scsi_eh_reset(struct scsi_cmnd *scmd)
 /**
  * scsi_eh_scmd_add - add scsi cmd to error handling.
  * @scmd:  scmd to run eh on.
- * @eh_flag:   optional SCSI_EH flag.
  */
-void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
+void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
 {
struct Scsi_Host *shost = scmd->device->host;
unsigned long flags;
@@ -242,9 +241,6 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
if (shost->eh_deadline != -1 && !shost->last_reset)
shost->last_reset = jiffies;
 
-   if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
-   eh_flag &= ~SCSI_EH_CANCEL_CMD;
-   scmd->eh_eflags |= eh_flag;
scsi_eh_reset(scmd);
list_add_tail(>eh_entry, >eh_cmd_q);
shost->host_failed++;
@@ -278,10 +274,9 @@ enum blk_eh_timer_return scsi_times_out(struct request 
*req)
rtn = host->hostt->eh_timed_out(scmd);
 
if (rtn == BLK_EH_NOT_HANDLED) {
-   if (host->hostt->no_async_abort ||
-   scsi_abort_command(scmd) != SUCCESS) {
+   if (scsi_abort_command(scmd) != SUCCESS) {
set_host_byte(scmd, DID_TIME_OUT);
-   scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
+   scsi_eh_scmd_add(scmd);
}
}
 
@@ -334,7 +329,7 @@ static inline void scsi_eh_prt_fail_stats(struct Scsi_Host 
*shost,
list_for_each_entry(scmd, work_q, eh_entry) {
if (scmd->device == sdev) {
++total_failures;
-   if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD)
+   if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
++cmd_cancel;
else
++cmd_failed;
@@ -1157,8 +1152,7 @@ int scsi_eh_get_sense(struct

[PATCH 1/2] block: Implement global tagset

2017-04-04 Thread Hannes Reinecke

Most legacy HBAs have a tagset per HBA, not per queue. To map
these devices onto block-mq this patch implements a new tagset
flag BLK_MQ_F_GLOBAL_TAGS, which will cause the tag allocator
to use just one tagset for all hardware queues.

Signed-off-by: Hannes Reinecke 
---
 block/blk-mq-tag.c | 12 
 block/blk-mq.c | 10 --
 include/linux/blk-mq.h |  1 +
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index e48bc2c..a14e76c 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -276,9 +276,11 @@ static void blk_mq_all_tag_busy_iter(struct blk_mq_tags 
*tags,
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
busy_tag_iter_fn *fn, void *priv)
 {
-   int i;
+   int i, lim = tagset->nr_hw_queues;
 
-   for (i = 0; i < tagset->nr_hw_queues; i++) {
+   if (tagset->flags & BLK_MQ_F_GLOBAL_TAGS)
+   lim = 1;
+   for (i = 0; i < lim; i++) {
if (tagset->tags && tagset->tags[i])
blk_mq_all_tag_busy_iter(tagset->tags[i], fn, priv);
}
@@ -287,12 +289,14 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set 
*tagset,
 
 int blk_mq_reinit_tagset(struct blk_mq_tag_set *set)
 {
-   int i, j, ret = 0;
+   int i, j, ret = 0, lim = set->nr_hw_queues;
 
if (!set->ops->reinit_request)
goto out;
 
-   for (i = 0; i < set->nr_hw_queues; i++) {
+   if (set->flags & BLK_MQ_F_GLOBAL_TAGS)
+   lim = 1;
+   for (i = 0; i < lim; i++) {
struct blk_mq_tags *tags = set->tags[i];
 
for (j = 0; j < tags->nr_tags; j++) {
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 159187a..db96ed0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2061,6 +2061,10 @@ static bool __blk_mq_alloc_rq_map(struct blk_mq_tag_set 
*set, int hctx_idx)
 {
int ret = 0;
 
+   if ((set->flags & BLK_MQ_F_GLOBAL_TAGS) && hctx_idx != 0) {
+   set->tags[hctx_idx] = set->tags[0];
+   return true;
+   }
set->tags[hctx_idx] = blk_mq_alloc_rq_map(set, hctx_idx,
set->queue_depth, set->reserved_tags);
if (!set->tags[hctx_idx])
@@ -2080,8 +2084,10 @@ static void blk_mq_free_map_and_requests(struct 
blk_mq_tag_set *set,
 unsigned int hctx_idx)
 {
if (set->tags[hctx_idx]) {
-   blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
-   blk_mq_free_rq_map(set->tags[hctx_idx]);
+   if (!(set->flags & BLK_MQ_F_GLOBAL_TAGS) || hctx_idx == 0) {
+   blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
+   blk_mq_free_rq_map(set->tags[hctx_idx]);
+   }
set->tags[hctx_idx] = NULL;
}
 }
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index b296a90..eee27b016 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -155,6 +155,7 @@ enum {
BLK_MQ_F_DEFER_ISSUE= 1 << 4,
BLK_MQ_F_BLOCKING   = 1 << 5,
BLK_MQ_F_NO_SCHED   = 1 << 6,
+   BLK_MQ_F_GLOBAL_TAGS= 1 << 7,
BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
BLK_MQ_F_ALLOC_POLICY_BITS = 1,
 
-- 
1.8.5.6

[RFC PATCH 0/2] block,scsi: support host-wide tagset

2017-04-04 Thread Hannes Reinecke

Hi all,

as discussed recently most existing HBAs have a host-wide tagset which
does not map easily onto the per-queue tagset model of block mq.
This patchset implements a flag BLK_MQ_F_GLOBAL_TAGS for block-mq, which
enables the use of a shared tagset for all hardware queues.
The second patch adds a flag 'host_tagset' to the SCSI host template,
which allows drivers to enable the use of the global tagset.

This patchset probably has some performance implications as
there is a quite high probability of cache-bouncing when allocating
tags. Also I'm not quite sure if the implemented tagset sharing
is the correct way to handle things.
So this can be considered an RFC.

As usual, comments and reviews are welcome.

Hannes Reinecke (2):
  block: Implement global tagset
  scsi: Add template flag 'host_tagset'

 block/blk-mq-tag.c   | 12 
 block/blk-mq.c   | 10 --
 drivers/scsi/scsi_lib.c  |  2 ++
 include/linux/blk-mq.h   |  1 +
 include/scsi/scsi_host.h |  5 +
 5 files changed, 24 insertions(+), 6 deletions(-)

-- 
1.8.5.6

[PATCH 2/2] scsi: Add template flag 'host_tagset'

2017-04-04 Thread Hannes Reinecke

Add a host template flag 'host_tagset' to enable the use of a
global tagmap for block-mq.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_lib.c  | 2 ++
 include/scsi/scsi_host.h | 5 +
 2 files changed, 7 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ba22866..00036cb 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2193,6 +2193,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
shost->tag_set.cmd_size = cmd_size;
shost->tag_set.numa_node = NUMA_NO_NODE;
shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
+   if (shost->hostt->host_tagset)
+   shost->tag_set.flags |= BLK_MQ_F_GLOBAL_TAGS;
shost->tag_set.flags |=
BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy);
shost->tag_set.driver_data = shost;
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 3cd8c3b..dff3ec1 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -457,6 +457,11 @@ struct scsi_host_template {
unsigned no_async_abort:1;
 
/*
+* True if the host supports a host-wide tagspace
+*/
+   unsigned host_tagset:1;
+
+   /*
 * Countdown for host blocking with no commands outstanding.
 */
unsigned int max_host_blocked;
-- 
1.8.5.6

Re: [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown

2017-04-04 Thread Mauricio Faria de Oliveira


Hi Martin and Junichi,

On 04/03/2017 11:10 PM, Junichi Nomura wrote:

On 04/04/17 06:53, Mauricio Faria de Oliveira wrote:



On 03/28/2017 11:29 PM, Junichi Nomura wrote:

Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications"),
"rmmod lpfc" starting to cause panic or corruption due to double free.



Thanks for the report. Can you please check whether the patch just sent
([PATCH] lpfc: fix double free of bound CQ/WQ ring pointer) resolves it?



It works for me. Thank you!


Excellent, thanks!

Martin, can you review/consider it for 4.11-rc6, please?


Considering future maintenance, it might be a bit fragile to just depend
on the code comment about representing the relation between cq/wq and
shared pring but it's maintainers' choice.


I agree -- there should be a better way of identifying a bound WQ/CQ.
Perhaps there is, but I couldn't find it currently.

For now, as far as I could grep and examine the code (detailed in commit
message), a WQ is always bound to a CQ, so to check for WQ and not free
its ring pointer seems to be sufficient (as the CQ ring pointer is freed
first).

If that changes, probably some form of flagging and/or queue type
determination would be better/necessary.

cheers,

--
Mauricio Faria de Oliveira
IBM Linux Technology Center

Re: [PATCH] sas: remove sas_domain_release_transport

2017-04-04 Thread Hannes Reinecke

On 04/03/2017 04:32 PM, Johannes Thumshirn wrote:
> sas_domain_release_transport is unused since at least v3.13, remove it.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/scsi/libsas/sas_init.c | 7 ---
>  include/scsi/libsas.h  | 1 -
>  2 files changed, 8 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [RFC 6/8] nvmet: Be careful about using iomem accesses when dealing with p2pmem

2017-04-04 Thread Sagi Grimberg




 u16 nvmet_copy_to_sgl(struct nvmet_req *req, off_t off, const void *buf,
size_t len)
 {
-   if (sg_pcopy_from_buffer(req->sg, req->sg_cnt, buf, len, off) != len)
+   bool iomem = req->p2pmem;
+   size_t ret;
+
+   ret = sg_copy_buffer(req->sg, req->sg_cnt, (void *)buf, len, off,
+false, iomem);
+
+   if (ret != len)
return NVME_SC_SGL_INVALID_DATA | NVME_SC_DNR;
+
return 0;
 }


We can never ever get here from an IO command, and that is a good thing
because it would have been broken if we did, regardless of what copy
method we use...

Note that the nvme completion queues are still on the host memory, so
this means we have lost the ordering between data and completions as
they go to different pcie targets.

If at all, this is the place to *emphasize* we must never get here
with p2pmem, and immediately fail if we do.

I'm not sure what will happen with to copy_from_sgl, I guess we
have the same race because the nvme submission queues are also
on the host memory (which is on a different pci target). Maybe
more likely to happen with write-combine enabled?

Anyway I don't think we have a real issue here *currently*, because
we use copy_to_sgl only for admin/fabrics commands emulation and
copy_from_sgl to setup dsm ranges...

Re: [RFC 4/8] p2pmem: Add debugfs "stats" file

2017-04-04 Thread Sagi Grimberg




+   p2pmem_debugfs_root = debugfs_create_dir("p2pmem", NULL);
+   if (!p2pmem_debugfs_root)
+   pr_info("could not create debugfs entry, continuing\n");
+


Why continue? I think it'd be better to just fail it.

Besides, this can be safely squashed into patch 1.

Re: [RFC 2/8] cxgb4: setup pcie memory window 4 and create p2pmem region

2017-04-04 Thread Sagi Grimberg




+static void setup_memwin_p2pmem(struct adapter *adap)
+{
+   unsigned int mem_base = t4_read_reg(adap, CIM_EXTMEM2_BASE_ADDR_A);
+   unsigned int mem_size = t4_read_reg(adap, CIM_EXTMEM2_ADDR_SIZE_A);
+
+   if (!use_p2pmem)
+   return;


This is weird, why even call this if !use_p2pmem?


+static int init_p2pmem(struct adapter *adapter)
+{
+   unsigned int mem_size = t4_read_reg(adapter, CIM_EXTMEM2_ADDR_SIZE_A);
+   struct p2pmem_dev *p;
+   int rc;
+   struct resource res;
+
+   if (!mem_size || !use_p2pmem)
+   return 0;


Again, weird...

Re: [RFC 3/8] nvmet: Use p2pmem in nvme target

2017-04-04 Thread Sagi Grimberg


Hey Logan,


We create a configfs attribute in each nvme-fabrics target port to
enable p2p memory use. When enabled, the port will only then use the
p2p memory if a p2p memory device can be found which is behind the
same switch as the RDMA port and all the block devices in use. If
the user enabled it an no devices are found, then the system will
silently fall back on using regular memory.


What should we do if we have more than a single device that satisfies
this? I'd say that it would be better to have the user ask for a
specific device and fail it if it doesn't meet the above conditions...


If appropriate, that port will allocate memory for the RDMA buffers
for queues from the p2pmem device falling back to system memory should
anything fail.


That's good :)


Ideally, we'd want to use an NVME CMB buffer as p2p memory. This would
save an extra PCI transfer as the NVME card could just take the data
out of it's own memory. However, at this time, cards with CMB buffers
don't seem to be available.


Even if it was available, it would be hard to make real use of this
given that we wouldn't know how to pre-post recv buffers (for in-capsule
data). But let's leave this out of the scope entirely...


diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index ecc4fe8..7fd4840 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 

 #include 
@@ -64,6 +65,7 @@ struct nvmet_rdma_rsp {
struct rdma_rw_ctx  rw;

struct nvmet_reqreq;
+   struct p2pmem_dev   *p2pmem;


Why do you need this? you have a reference to the
queue itself.


@@ -107,6 +109,8 @@ struct nvmet_rdma_queue {
int send_queue_size;

struct list_headqueue_list;
+
+   struct p2pmem_dev   *p2pmem;
 };

 struct nvmet_rdma_device {
@@ -185,7 +189,8 @@ nvmet_rdma_put_rsp(struct nvmet_rdma_rsp *rsp)
spin_unlock_irqrestore(>queue->rsps_lock, flags);
 }

-static void nvmet_rdma_free_sgl(struct scatterlist *sgl, unsigned int nents)
+static void nvmet_rdma_free_sgl(struct scatterlist *sgl, unsigned int nents,
+   struct p2pmem_dev *p2pmem)
 {
struct scatterlist *sg;
int count;
@@ -193,13 +198,17 @@ static void nvmet_rdma_free_sgl(struct scatterlist *sgl, 
unsigned int nents)
if (!sgl || !nents)
return;

-   for_each_sg(sgl, sg, nents, count)
-   __free_page(sg_page(sg));
+   for_each_sg(sgl, sg, nents, count) {
+   if (p2pmem)
+   p2pmem_free_page(p2pmem, sg_page(sg));
+   else
+   __free_page(sg_page(sg));
+   }
kfree(sgl);
 }

 static int nvmet_rdma_alloc_sgl(struct scatterlist **sgl, unsigned int *nents,
-   u32 length)
+   u32 length, struct p2pmem_dev *p2pmem)
 {
struct scatterlist *sg;
struct page *page;
@@ -216,7 +225,11 @@ static int nvmet_rdma_alloc_sgl(struct scatterlist **sgl, 
unsigned int *nents,
while (length) {
u32 page_len = min_t(u32, length, PAGE_SIZE);

-   page = alloc_page(GFP_KERNEL);
+   if (p2pmem)
+   page = p2pmem_alloc_page(p2pmem);
+   else
+   page = alloc_page(GFP_KERNEL);
+
if (!page)
goto out_free_pages;

@@ -231,7 +244,10 @@ static int nvmet_rdma_alloc_sgl(struct scatterlist **sgl, 
unsigned int *nents,
 out_free_pages:
while (i > 0) {
i--;
-   __free_page(sg_page([i]));
+   if (p2pmem)
+   p2pmem_free_page(p2pmem, sg_page([i]));
+   else
+   __free_page(sg_page([i]));
}
kfree(sg);
 out:
@@ -484,7 +500,8 @@ static void nvmet_rdma_release_rsp(struct nvmet_rdma_rsp 
*rsp)
}

if (rsp->req.sg != >cmd->inline_sg)
-   nvmet_rdma_free_sgl(rsp->req.sg, rsp->req.sg_cnt);
+   nvmet_rdma_free_sgl(rsp->req.sg, rsp->req.sg_cnt,
+   rsp->p2pmem);

if (unlikely(!list_empty_careful(>rsp_wr_wait_list)))
nvmet_rdma_process_wr_wait_list(queue);
@@ -625,8 +642,16 @@ static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp 
*rsp,
if (!len)
return 0;

+   rsp->p2pmem = rsp->queue->p2pmem;
status = nvmet_rdma_alloc_sgl(>req.sg, >req.sg_cnt,
-   len);
+   len, rsp->p2pmem);
+
+   if (status && rsp->p2pmem) {
+   rsp->p2pmem = NULL;
+   status = nvmet_rdma_alloc_sgl(>req.sg, >req.sg_cnt,
+ len, rsp->p2pmem);
+   }
+


Not sure its a good practice to rely on rsp->p2pmem not being NULL...
Would be nice if the allocation routines can hide it from us...

Re: [PATCH] csiostor: switch to pci_alloc_irq_vectors

2017-04-04 Thread Varun Prakash

On Tue, Apr 04, 2017 at 08:46:14AM +0200, Christoph Hellwig wrote:
> Does this one work better?
>

csiostor driver is triggering following warning during module unload.

WARNING: CPU: 8 PID: 20636 at kernel/irq/manage.c:1480 __free_irq+0xa6/0x2b0
Trying to free already-free IRQ 53
CPU: 8 PID: 20636 Comm: rmmod Tainted: GB   W  OE   4.11.0-rc5+ #2
Call Trace:
dump_stack+0x63/0x84
__warn+0xd1/0xf0
warn_slowpath_fmt+0x5f/0x80
__free_irq+0xa6/0x2b0
free_irq+0x39/0x90
csio_free_irqs+0x34/0x90 [csiostor]
csio_hw_free+0x12/0xb0 [csiostor]
csio_remove_one+0x6e/0x90 [csiostor]
pci_device_remove+0x39/0xc0
device_release_driver_internal+0x141/0x1f0
driver_detach+0x3f/0x80
bus_remove_driver+0x55/0xd0
driver_unregister+0x2c/0x50
pci_unregister_driver+0x2a/0xa0
csio_exit+0x10/0xf70 [csiostor]
SyS_delete_module+0x1ba/0x220
do_syscall_64+0x67/0x180
entry_SYSCALL64_slow_path+0x25/0x25

kernel/irq/manage.c:1480
1457 static struct irqaction *__free_irq(unsigned int irq, void *dev_id)
1458 {
1459 struct irq_desc *desc = irq_to_desc(irq);
1460 struct irqaction *action, **action_ptr;
1461 unsigned long flags;
1462
 ...
1475 action_ptr = >action;
1476 for (;;) {
1477 action = *action_ptr;
1478
1479 if (!action) {
1480 WARN(1, "Trying to free already-free IRQ %d\n", 
irq);
1481 raw_spin_unlock_irqrestore(>lock, flags);
1482 chip_bus_sync_unlock(desc);
1483 return NULL;
1484 }
1485
1486 if (action->dev_id == dev_id)
1487 break;
1488 action_ptr = >next;
1489 }
 ...
1546 }

Re: [PATCH 1/2] scsi: convert unrecovered read error to -EILSEQ

2017-04-04 Thread Christoph Hellwig

I'm planning to introduce new block-layer specific status code ASAP,
so I'd prefer not to add new errno special cases.

I'll port your patches to the new code and will send them out with
my series in a few days, though.

Re: [PATCH 1/1] aacraid: pci_alloc_consistent() failures on ARM64

2017-04-04 Thread Christoph Hellwig

On Tue, Apr 04, 2017 at 12:23:19PM +0530, Mahesh Rajashekhara wrote:
> +#if defined(CONFIG_ARM64)
> +static inline void *
> +aac_pci_alloc_consistent(struct pci_dev *hwdev, size_t size,
> +  dma_addr_t *dma_handle) {
> + return dma_alloc_coherent(hwdev == NULL ? NULL : >dev,
> +   size,
> +   dma_handle,
> +   GFP_KERNEL);
> +}
> +#else
> +static inline void *
> +aac_pci_alloc_consistent(struct pci_dev *hwdev, size_t size,
> +  dma_addr_t *dma_handle) {
> + return pci_alloc_consistent(hwdev, size, dma_handle);
> +}
> +#endif

Please just call the dma_*_coherent routines directly all the time.
hwdev should never be NULL, so that workaround can be dropped as well.

And while you're at it please remove all the casts to and from
void * in the arguments and return values.

[PATCH 1/1] aacraid: pci_alloc_consistent() failures on ARM64

2017-04-04 Thread Mahesh Rajashekhara

There were pci_alloc_consistent() failures on ARM64 platform.
Use dma_alloc_coherent() with GFP_KERNEL flag DMA memory allocations.

Signed-off-by: Mahesh Rajashekhara 
---
 drivers/scsi/aacraid/aachba.c   | 13 +
 drivers/scsi/aacraid/aacraid.h  | 22 ++
 drivers/scsi/aacraid/commctrl.c |  7 +--
 drivers/scsi/aacraid/comminit.c |  4 ++--
 drivers/scsi/aacraid/commsup.c  | 16 ++--
 drivers/scsi/aacraid/linit.c|  8 +---
 drivers/scsi/aacraid/rx.c   | 11 ++-
 7 files changed, 59 insertions(+), 22 deletions(-)

diff --git a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
index e3e93de..d07da1a 100644
--- a/drivers/scsi/aacraid/aachba.c
+++ b/drivers/scsi/aacraid/aachba.c
@@ -1678,11 +1678,12 @@ int aac_issue_bmic_identify(struct aac_dev *dev, u32 
bus, u32 target)
sizeof(struct sgentry) + sizeof(struct sgentry64);
datasize = sizeof(struct aac_ciss_identify_pd);
 
-   identify_resp =  pci_alloc_consistent(dev->pdev, datasize, );
+   identify_resp =  aac_pci_alloc_consistent(dev->pdev, datasize, );
 
if (!identify_resp)
goto fib_free_ptr;
 
+   memset(identify_resp, 0, datasize);
vbus = (u32)le16_to_cpu(dev->supplement_adapter_info.virt_device_bus);
vid = (u32)le16_to_cpu(dev->supplement_adapter_info.virt_device_target);
 
@@ -1720,7 +1721,10 @@ int aac_issue_bmic_identify(struct aac_dev *dev, u32 
bus, u32 target)
dev->hba_map[bus][target].qd_limit =
identify_resp->current_queue_depth_limit;
 
-   pci_free_consistent(dev->pdev, datasize, (void *)identify_resp, addr);
+   aac_pci_free_consistent(dev->pdev,
+   datasize,
+   (void *)identify_resp,
+   addr);
 
aac_fib_complete(fibptr);
 
@@ -1814,7 +1818,7 @@ int aac_report_phys_luns(struct aac_dev *dev, struct fib 
*fibptr, int rescan)
datasize = sizeof(struct aac_ciss_phys_luns_resp)
+ (AAC_MAX_TARGETS - 1) * sizeof(struct _ciss_lun);
 
-   phys_luns = (struct aac_ciss_phys_luns_resp *) pci_alloc_consistent(
+   phys_luns = (struct aac_ciss_phys_luns_resp *) aac_pci_alloc_consistent(
dev->pdev, datasize, );
 
if (phys_luns == NULL) {
@@ -1822,6 +1826,7 @@ int aac_report_phys_luns(struct aac_dev *dev, struct fib 
*fibptr, int rescan)
goto err_out;
}
 
+   memset(phys_luns, 0, datasize);
vbus = (u32) le16_to_cpu(
dev->supplement_adapter_info.virt_device_bus);
vid = (u32) le16_to_cpu(
@@ -1861,7 +1866,7 @@ int aac_report_phys_luns(struct aac_dev *dev, struct fib 
*fibptr, int rescan)
aac_update_hba_map(dev, phys_luns, rescan);
}
 
-   pci_free_consistent(dev->pdev, datasize, (void *) phys_luns, addr);
+   aac_pci_free_consistent(dev->pdev, datasize, (void *) phys_luns, addr);
 err_out:
return rcode;
 }
diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index d036a80..12121d6 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -2692,4 +2692,26 @@ extern int update_interval;
 extern int check_interval;
 extern int aac_check_reset;
 extern int aac_fib_dump;
+#if defined(CONFIG_ARM64)
+static inline void *
+aac_pci_alloc_consistent(struct pci_dev *hwdev, size_t size,
+dma_addr_t *dma_handle) {
+   return dma_alloc_coherent(hwdev == NULL ? NULL : >dev,
+ size,
+ dma_handle,
+ GFP_KERNEL);
+}
+#else
+static inline void *
+aac_pci_alloc_consistent(struct pci_dev *hwdev, size_t size,
+dma_addr_t *dma_handle) {
+   return pci_alloc_consistent(hwdev, size, dma_handle);
+}
+#endif
+static inline void
+aac_pci_free_consistent(struct pci_dev *hwdev, size_t size,
+   void *vaddr, dma_addr_t dma_handle)
+{
+   pci_free_consistent(hwdev, size, vaddr, dma_handle);
+}
 #endif
diff --git a/drivers/scsi/aacraid/commctrl.c b/drivers/scsi/aacraid/commctrl.c
index f6afd50..7d1757a 100644
--- a/drivers/scsi/aacraid/commctrl.c
+++ b/drivers/scsi/aacraid/commctrl.c
@@ -100,12 +100,13 @@ static int ioctl_send_fib(struct aac_dev * dev, void 
__user *arg)
goto cleanup;
}
 
-   kfib = pci_alloc_consistent(dev->pdev, size, );
+   kfib = aac_pci_alloc_consistent(dev->pdev, size, );
if (!kfib) {
retval = -ENOMEM;
goto cleanup;
}
 
+   memset(kfib, 0, size);
/* Highjack the hw_fib */
hw_fib = fibptr->hw_fib_va;
hw_fib_pa = fibptr->hw_fib_pa;
@@ -160,7 +161,9 @@

Re: [PATCH v2] scsi: storvsc: Add support for FC rport.

2017-04-04 Thread Christoph Hellwig

On Mon, Apr 03, 2017 at 09:14:02AM -0400, Cathy Avery wrote:
> On 04/03/2017 08:17 AM, Christoph Hellwig wrote:
> > >   if (host->transportt == fc_transport_template) {
> > > + struct fc_rport_identifiers ids = {
> > > + .roles = FC_PORT_ROLE_FCP_TARGET,
> > > + };
> > I don't think storvsc ever acts as FCP target.
> 
> In order to implement the work around so that the scsi scan works indicating
> FC_PORT_ROLE_FCP_TARGET as a role was necessary due to its test in
> fc_scsi_scan_rport. The idea here is to avoid making any changes to the
> fc_transport driver which was of some concern.

Please add a FC_PORT_ROLE_FCP_DUMMY_INITIATOR role instead.

Re: [PATCH] csiostor: switch to pci_alloc_irq_vectors

2017-04-04 Thread Christoph Hellwig

Does this one work better?

---
>From e5a4178cb810be581b6d9b8f48f13b12e88eae74 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Thu, 12 Jan 2017 11:17:29 +0100
Subject: csiostor: switch to pci_alloc_irq_vectors

And get automatic MSI-X affinity for free.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/csiostor/csio_hw.h   |   4 +-
 drivers/scsi/csiostor/csio_init.c |   9 +--
 drivers/scsi/csiostor/csio_isr.c  | 127 +-
 3 files changed, 51 insertions(+), 89 deletions(-)

diff --git a/drivers/scsi/csiostor/csio_hw.h b/drivers/scsi/csiostor/csio_hw.h
index 029bef82c057..a084d83ce70f 100644
--- a/drivers/scsi/csiostor/csio_hw.h
+++ b/drivers/scsi/csiostor/csio_hw.h
@@ -95,7 +95,6 @@ enum {
 };
 
 struct csio_msix_entries {
-   unsigned short  vector; /* Assigned MSI-X vector */
void*dev_id;/* Priv object associated w/ this msix*/
chardesc[24];   /* Description of this vector */
 };
@@ -591,8 +590,9 @@ int csio_enqueue_evt(struct csio_hw *, enum csio_evt, void 
*, uint16_t);
 void csio_evtq_flush(struct csio_hw *hw);
 
 int csio_request_irqs(struct csio_hw *);
+void csio_free_irqs(struct csio_hw *);
 void csio_intr_enable(struct csio_hw *);
-void csio_intr_disable(struct csio_hw *, bool);
+void csio_intr_disable(struct csio_hw *);
 void csio_hw_fatal_err(struct csio_hw *);
 
 struct csio_lnode *csio_lnode_alloc(struct csio_hw *);
diff --git a/drivers/scsi/csiostor/csio_init.c 
b/drivers/scsi/csiostor/csio_init.c
index dbe416ff46c2..7e60699c8b55 100644
--- a/drivers/scsi/csiostor/csio_init.c
+++ b/drivers/scsi/csiostor/csio_init.c
@@ -461,8 +461,7 @@ csio_config_queues(struct csio_hw *hw)
return 0;
 
 intr_disable:
-   csio_intr_disable(hw, false);
-
+   csio_intr_disable(hw);
return -EINVAL;
 }
 
@@ -575,7 +574,8 @@ static struct csio_hw *csio_hw_alloc(struct pci_dev *pdev)
 static void
 csio_hw_free(struct csio_hw *hw)
 {
-   csio_intr_disable(hw, true);
+   csio_free_irqs(hw);
+   csio_intr_disable(hw);
csio_hw_exit_workers(hw);
csio_hw_exit(hw);
iounmap(hw->regstart);
@@ -1068,7 +1068,8 @@ csio_pci_error_detected(struct pci_dev *pdev, 
pci_channel_state_t state)
spin_unlock_irq(>lock);
csio_lnodes_unblock_request(hw);
csio_lnodes_exit(hw, 0);
-   csio_intr_disable(hw, true);
+   csio_free_irqs(hw);
+   csio_intr_disable(hw);
pci_disable_device(pdev);
return state == pci_channel_io_perm_failure ?
PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_NEED_RESET;
diff --git a/drivers/scsi/csiostor/csio_isr.c b/drivers/scsi/csiostor/csio_isr.c
index 2fb71c6c3b37..a4ad847e9c53 100644
--- a/drivers/scsi/csiostor/csio_isr.c
+++ b/drivers/scsi/csiostor/csio_isr.c
@@ -383,17 +383,15 @@ csio_request_irqs(struct csio_hw *hw)
int rv, i, j, k = 0;
struct csio_msix_entries *entryp = >msix_entries[0];
struct csio_scsi_cpu_info *info;
+   struct pci_dev *pdev = hw->pdev;
 
if (hw->intr_mode != CSIO_IM_MSIX) {
-   rv = request_irq(hw->pdev->irq, csio_fcoe_isr,
-   (hw->intr_mode == CSIO_IM_MSI) ?
-   0 : IRQF_SHARED,
-   KBUILD_MODNAME, hw);
+   rv = request_irq(pci_irq_vector(pdev, 0), csio_fcoe_isr,
+   hw->intr_mode == CSIO_IM_MSI ? 0 : IRQF_SHARED,
+   KBUILD_MODNAME, hw);
if (rv) {
-   if (hw->intr_mode == CSIO_IM_MSI)
-   pci_disable_msi(hw->pdev);
csio_err(hw, "Failed to allocate interrupt line.\n");
-   return -EINVAL;
+   goto out_free_irqs;
}
 
goto out;
@@ -402,22 +400,22 @@ csio_request_irqs(struct csio_hw *hw)
/* Add the MSIX vector descriptions */
csio_add_msix_desc(hw);
 
-   rv = request_irq(entryp[k].vector, csio_nondata_isr, 0,
+   rv = request_irq(pci_irq_vector(pdev, k), csio_nondata_isr, 0,
 entryp[k].desc, hw);
if (rv) {
csio_err(hw, "IRQ request failed for vec %d err:%d\n",
-entryp[k].vector, rv);
-   goto err;
+pci_irq_vector(pdev, k), rv);
+   goto out_free_irqs;
}
 
-   entryp[k++].dev_id = (void *)hw;
+   entryp[k++].dev_id = hw;
 
-   rv = request_irq(entryp[k].vector, csio_fwevt_isr, 0,
+   rv = request_irq(pci_irq_vector(pdev, k), csio_fwevt_isr, 0,
 entryp[k].desc, hw);
if (rv) {
csio_err(hw, "IRQ request failed for vec %d err:%d\n",
-entryp[k].vector, rv);
-   goto err;
+

75 matches

Mail list logo