date:20170407

[PATCH v7 00/15] Replace PCI pool by DMA pool API

2017-04-07 Thread Romain Perier

The current PCI pool API are simple macro functions direct expanded to
the appropriate dma pool functions. The prototypes are almost the same
and semantically, they are very similar. I propose to use the DMA pool
API directly and get rid of the old API.

This set of patches, replaces the old API by the dma pool API
and remove the defines.

Changes in v7:
- Rebased series onto next-20170416
- Added Acked-by, Tested-by and Reviwed-by tags

Changes in v6:
- Fixed an issue reported by kbuild test robot about changes in DAC960
- Removed patches 15/19,16/19,17/19,18/19. They have been merged by Greg
- Added Acked-by Tags

Changes in v5:
- Re-worded the cover letter (remove sentence about checkpatch.pl)
- Rebased series onto next-20170308
- Fix typos in commit message
- Added Acked-by Tags

Changes in v4:
- Rebased series onto next-20170301
- Removed patch 20/20: checks done by checkpath.pl, no longer required.
  Thanks to Peter and Joe for their feedbacks.
- Added Reviewed-by tags

Changes in v3:
- Rebased series onto next-20170224
- Fix checkpath.pl reports for patch 11/20 and patch 12/20
- Remove prefix RFC
Changes in v2:
- Introduced patch 18/20
- Fixed cosmetic changes: spaces before brace, live over 80 characters
- Removed some of the check for NULL pointers before calling dma_pool_destroy
- Improved the regexp in checkpatch for pci_pool, thanks to Joe Perches
- Added Tested-by and Acked-by tags

Romain Perier (15):
  block: DAC960: Replace PCI pool old API
  dmaengine: pch_dma: Replace PCI pool old API
  IB/mthca: Replace PCI pool old API
  net: e100: Replace PCI pool old API
  mlx4: Replace PCI pool old API
  mlx5: Replace PCI pool old API
  wireless: ipw2200: Replace PCI pool old API
  scsi: be2iscsi: Replace PCI pool old API
  scsi: csiostor: Replace PCI pool old API
  scsi: lpfc: Replace PCI pool old API
  scsi: megaraid: Replace PCI pool old API
  scsi: mpt3sas: Replace PCI pool old API
  scsi: mvsas: Replace PCI pool old API
  scsi: pmcraid: Replace PCI pool old API
  PCI: Remove PCI pool macro functions

 drivers/block/DAC960.c|  38 +
 drivers/block/DAC960.h|   4 +-
 drivers/dma/pch_dma.c |  12 +--
 drivers/infiniband/hw/mthca/mthca_av.c|  10 +--
 drivers/infiniband/hw/mthca/mthca_cmd.c   |   8 +-
 drivers/infiniband/hw/mthca/mthca_dev.h   |   4 +-
 drivers/net/ethernet/intel/e100.c |  12 +--
 drivers/net/ethernet/mellanox/mlx4/cmd.c  |  10 +--
 drivers/net/ethernet/mellanox/mlx4/mlx4.h |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  11 +--
 drivers/net/wireless/intel/ipw2x00/ipw2200.c  |  13 ++--
 drivers/scsi/be2iscsi/be_iscsi.c  |   6 +-
 drivers/scsi/be2iscsi/be_main.c   |   6 +-
 drivers/scsi/be2iscsi/be_main.h   |   2 +-
 drivers/scsi/csiostor/csio_hw.h   |   2 +-
 drivers/scsi/csiostor/csio_init.c |  11 +--
 drivers/scsi/csiostor/csio_scsi.c |   6 +-
 drivers/scsi/lpfc/lpfc.h  |  14 ++--
 drivers/scsi/lpfc/lpfc_init.c |  16 ++--
 drivers/scsi/lpfc/lpfc_mem.c  | 106 +-
 drivers/scsi/lpfc/lpfc_nvme.c |   6 +-
 drivers/scsi/lpfc/lpfc_nvmet.c|   4 +-
 drivers/scsi/lpfc/lpfc_scsi.c |  12 +--
 drivers/scsi/megaraid/megaraid_mbox.c |  33 
 drivers/scsi/megaraid/megaraid_mm.c   |  32 
 drivers/scsi/megaraid/megaraid_sas_base.c |  29 +++
 drivers/scsi/megaraid/megaraid_sas_fusion.c   |  66 
 drivers/scsi/mpt3sas/mpt3sas_base.c   |  73 +-
 drivers/scsi/mvsas/mv_init.c  |   6 +-
 drivers/scsi/mvsas/mv_sas.c   |   6 +-
 drivers/scsi/pmcraid.c|  10 +--
 drivers/scsi/pmcraid.h|   2 +-
 include/linux/mlx5/driver.h   |   2 +-
 include/linux/pci.h   |   9 ---
 34 files changed, 280 insertions(+), 303 deletions(-)

-- 
2.9.3

[PATCH v7 07/15] wireless: ipw2200: Replace PCI pool old API

2017-04-07 Thread Romain Perier

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.

Signed-off-by: Romain Perier 
Reviewed-by: Peter Senna Tschudin 
---
 drivers/net/wireless/intel/ipw2x00/ipw2200.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/intel/ipw2x00/ipw2200.c 
b/drivers/net/wireless/intel/ipw2x00/ipw2200.c
index bbc579b..0ca2e04 100644
--- a/drivers/net/wireless/intel/ipw2x00/ipw2200.c
+++ b/drivers/net/wireless/intel/ipw2x00/ipw2200.c
@@ -3211,7 +3211,7 @@ static int ipw_load_firmware(struct ipw_priv *priv, u8 * 
data, size_t len)
struct fw_chunk *chunk;
int total_nr = 0;
int i;
-   struct pci_pool *pool;
+   struct dma_pool *pool;
void **virts;
dma_addr_t *phys;
 
@@ -3228,9 +3228,10 @@ static int ipw_load_firmware(struct ipw_priv *priv, u8 * 
data, size_t len)
kfree(virts);
return -ENOMEM;
}
-   pool = pci_pool_create("ipw2200", priv->pci_dev, CB_MAX_LENGTH, 0, 0);
+   pool = dma_pool_create("ipw2200", >pci_dev->dev, CB_MAX_LENGTH, 0,
+  0);
if (!pool) {
-   IPW_ERROR("pci_pool_create failed\n");
+   IPW_ERROR("dma_pool_create failed\n");
kfree(phys);
kfree(virts);
return -ENOMEM;
@@ -3255,7 +3256,7 @@ static int ipw_load_firmware(struct ipw_priv *priv, u8 * 
data, size_t len)
 
nr = (chunk_len + CB_MAX_LENGTH - 1) / CB_MAX_LENGTH;
for (i = 0; i < nr; i++) {
-   virts[total_nr] = pci_pool_alloc(pool, GFP_KERNEL,
+   virts[total_nr] = dma_pool_alloc(pool, GFP_KERNEL,
 [total_nr]);
if (!virts[total_nr]) {
ret = -ENOMEM;
@@ -3299,9 +3300,9 @@ static int ipw_load_firmware(struct ipw_priv *priv, u8 * 
data, size_t len)
}
  out:
for (i = 0; i < total_nr; i++)
-   pci_pool_free(pool, virts[i], phys[i]);
+   dma_pool_free(pool, virts[i], phys[i]);
 
-   pci_pool_destroy(pool);
+   dma_pool_destroy(pool);
kfree(phys);
kfree(virts);
 
-- 
2.9.3

[PATCH v7 09/15] scsi: csiostor: Replace PCI pool old API

2017-04-07 Thread Romain Perier

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API. It also updates
the name of some variables and the content of comments, accordingly.

Signed-off-by: Romain Perier 
Reviewed-by: Peter Senna Tschudin 
---
 drivers/scsi/csiostor/csio_hw.h   |  2 +-
 drivers/scsi/csiostor/csio_init.c | 11 ++-
 drivers/scsi/csiostor/csio_scsi.c |  6 +++---
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/csiostor/csio_hw.h b/drivers/scsi/csiostor/csio_hw.h
index 029bef8..55b04fc 100644
--- a/drivers/scsi/csiostor/csio_hw.h
+++ b/drivers/scsi/csiostor/csio_hw.h
@@ -465,7 +465,7 @@ struct csio_hw {
struct csio_pport   pport[CSIO_MAX_PPORTS]; /* Ports (XGMACs) */
struct csio_hw_params   params; /* Hw parameters */
 
-   struct pci_pool *scsi_pci_pool; /* PCI pool for SCSI */
+   struct dma_pool *scsi_dma_pool; /* DMA pool for SCSI */
mempool_t   *mb_mempool;/* Mailbox memory pool*/
mempool_t   *rnode_mempool; /* rnode memory pool */
 
diff --git a/drivers/scsi/csiostor/csio_init.c 
b/drivers/scsi/csiostor/csio_init.c
index dbe416f..292964c 100644
--- a/drivers/scsi/csiostor/csio_init.c
+++ b/drivers/scsi/csiostor/csio_init.c
@@ -485,9 +485,10 @@ csio_resource_alloc(struct csio_hw *hw)
if (!hw->rnode_mempool)
goto err_free_mb_mempool;
 
-   hw->scsi_pci_pool = pci_pool_create("csio_scsi_pci_pool", hw->pdev,
-   CSIO_SCSI_RSP_LEN, 8, 0);
-   if (!hw->scsi_pci_pool)
+   hw->scsi_dma_pool = dma_pool_create("csio_scsi_dma_pool",
+   >pdev->dev, CSIO_SCSI_RSP_LEN,
+   8, 0);
+   if (!hw->scsi_dma_pool)
goto err_free_rn_pool;
 
return 0;
@@ -505,8 +506,8 @@ csio_resource_alloc(struct csio_hw *hw)
 static void
 csio_resource_free(struct csio_hw *hw)
 {
-   pci_pool_destroy(hw->scsi_pci_pool);
-   hw->scsi_pci_pool = NULL;
+   dma_pool_destroy(hw->scsi_dma_pool);
+   hw->scsi_dma_pool = NULL;
mempool_destroy(hw->rnode_mempool);
hw->rnode_mempool = NULL;
mempool_destroy(hw->mb_mempool);
diff --git a/drivers/scsi/csiostor/csio_scsi.c 
b/drivers/scsi/csiostor/csio_scsi.c
index a1ff75f..dab0d3f 100644
--- a/drivers/scsi/csiostor/csio_scsi.c
+++ b/drivers/scsi/csiostor/csio_scsi.c
@@ -2445,7 +2445,7 @@ csio_scsim_init(struct csio_scsim *scm, struct csio_hw 
*hw)
 
/* Allocate Dma buffers for Response Payload */
dma_buf = >dma_buf;
-   dma_buf->vaddr = pci_pool_alloc(hw->scsi_pci_pool, GFP_KERNEL,
+   dma_buf->vaddr = dma_pool_alloc(hw->scsi_dma_pool, GFP_KERNEL,
_buf->paddr);
if (!dma_buf->vaddr) {
csio_err(hw,
@@ -2485,7 +2485,7 @@ csio_scsim_init(struct csio_scsim *scm, struct csio_hw 
*hw)
ioreq = (struct csio_ioreq *)tmp;
 
dma_buf = >dma_buf;
-   pci_pool_free(hw->scsi_pci_pool, dma_buf->vaddr,
+   dma_pool_free(hw->scsi_dma_pool, dma_buf->vaddr,
  dma_buf->paddr);
 
kfree(ioreq);
@@ -2516,7 +2516,7 @@ csio_scsim_exit(struct csio_scsim *scm)
ioreq = (struct csio_ioreq *)tmp;
 
dma_buf = >dma_buf;
-   pci_pool_free(scm->hw->scsi_pci_pool, dma_buf->vaddr,
+   dma_pool_free(scm->hw->scsi_dma_pool, dma_buf->vaddr,
  dma_buf->paddr);
 
kfree(ioreq);
-- 
2.9.3

[PATCH v7 06/15] mlx5: Replace PCI pool old API

2017-04-07 Thread Romain Perier

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.

Signed-off-by: Romain Perier 
Reviewed-by: Peter Senna Tschudin 
Acked-by: Doug Ledford 
Tested-by: Doug Ledford 
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 11 ++-
 include/linux/mlx5/driver.h   |  2 +-
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 5bdaf3d..aae7d92 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -1069,7 +1069,7 @@ static struct mlx5_cmd_mailbox *alloc_cmd_box(struct 
mlx5_core_dev *dev,
if (!mailbox)
return ERR_PTR(-ENOMEM);
 
-   mailbox->buf = pci_pool_zalloc(dev->cmd.pool, flags,
+   mailbox->buf = dma_pool_zalloc(dev->cmd.pool, flags,
   >dma);
if (!mailbox->buf) {
mlx5_core_dbg(dev, "failed allocation\n");
@@ -1084,7 +1084,7 @@ static struct mlx5_cmd_mailbox *alloc_cmd_box(struct 
mlx5_core_dev *dev,
 static void free_cmd_box(struct mlx5_core_dev *dev,
 struct mlx5_cmd_mailbox *mailbox)
 {
-   pci_pool_free(dev->cmd.pool, mailbox->buf, mailbox->dma);
+   dma_pool_free(dev->cmd.pool, mailbox->buf, mailbox->dma);
kfree(mailbox);
 }
 
@@ -1704,7 +1704,8 @@ int mlx5_cmd_init(struct mlx5_core_dev *dev)
return -EINVAL;
}
 
-   cmd->pool = pci_pool_create("mlx5_cmd", dev->pdev, size, align, 0);
+   cmd->pool = dma_pool_create("mlx5_cmd", >pdev->dev, size, align,
+   0);
if (!cmd->pool)
return -ENOMEM;
 
@@ -1794,7 +1795,7 @@ int mlx5_cmd_init(struct mlx5_core_dev *dev)
free_cmd_page(dev, cmd);
 
 err_free_pool:
-   pci_pool_destroy(cmd->pool);
+   dma_pool_destroy(cmd->pool);
 
return err;
 }
@@ -1808,6 +1809,6 @@ void mlx5_cmd_cleanup(struct mlx5_core_dev *dev)
destroy_workqueue(cmd->wq);
destroy_msg_cache(dev);
free_cmd_page(dev, cmd);
-   pci_pool_destroy(cmd->pool);
+   dma_pool_destroy(cmd->pool);
 }
 EXPORT_SYMBOL(mlx5_cmd_cleanup);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 6cd000f..7ea19ae 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -284,7 +284,7 @@ struct mlx5_cmd {
struct semaphore pages_sem;
int mode;
struct mlx5_cmd_work_ent *ent_arr[MLX5_MAX_COMMANDS];
-   struct pci_pool *pool;
+   struct dma_pool *pool;
struct mlx5_cmd_debug dbg;
struct cmd_msg_cache cache[MLX5_NUM_COMMAND_CACHES];
int checksum_disabled;
-- 
2.9.3

Re: [PATCH 2/4] mm: introduce memalloc_noreclaim_{save,restore}

2017-04-07 Thread Hillf Danton


On April 05, 2017 3:47 PM Vlastimil Babka wrote: 
> 
> The previous patch has shown that simply setting and clearing PF_MEMALLOC in
> current->flags can result in wrongly clearing a pre-existing PF_MEMALLOC flag
> and potentially lead to recursive reclaim. Let's introduce helpers that 
> support
> proper nesting by saving the previous stat of the flag, similar to the 
> existing
> memalloc_noio_* and memalloc_nofs_* helpers. Convert existing setting/clearing
> of PF_MEMALLOC within mm to the new helpers.
> 
> There are no known issues with the converted code, but the change makes it 
> more
> robust.
> 
> Suggested-by: Michal Hocko 
> Signed-off-by: Vlastimil Babka 
> ---

Acked-by: Hillf Danton

[PATCHv4 6/6] sg: close race condition in sg_remove_sfp_usercontext()

2017-04-07 Thread Hannes Reinecke

sg_remove_sfp_usercontext() is clearing any sg requests,
but needs to take 'rq_list_lock' when modifying the list.

Reported-by: Christoph Hellwig 
Signed-off-by: Hannes Reinecke 
Reviewed-by: Johannes Thumshirn 
Tested-by: Johannes Thumshirn 
---
 drivers/scsi/sg.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 5c9f1b9..8147147 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -524,6 +524,7 @@ static int sg_allow_access(struct file *filp, unsigned char 
*cmd)
} else
count = (old_hdr->result == 0) ? 0 : -EIO;
sg_finish_rem_req(srp);
+   sg_remove_request(sfp, srp);
retval = count;
 free_old_hdr:
kfree(old_hdr);
@@ -564,6 +565,7 @@ static int sg_allow_access(struct file *filp, unsigned char 
*cmd)
}
 err_out:
err2 = sg_finish_rem_req(srp);
+   sg_remove_request(sfp, srp);
return err ? : err2 ? : count;
 }
 
@@ -800,6 +802,7 @@ static bool sg_is_valid_dxfer(sg_io_hdr_t *hp)
SCSI_LOG_TIMEOUT(1, sg_printk(KERN_INFO, sfp->parentdp,
"sg_common_write: start_req err=%d\n", k));
sg_finish_rem_req(srp);
+   sg_remove_request(sfp, srp);
return k;   /* probably out of space --> ENOMEM */
}
if (atomic_read(>detaching)) {
@@ -810,6 +813,7 @@ static bool sg_is_valid_dxfer(sg_io_hdr_t *hp)
}
 
sg_finish_rem_req(srp);
+   sg_remove_request(sfp, srp);
return -ENODEV;
}
 
@@ -1285,6 +1289,7 @@ static long sg_compat_ioctl(struct file *filp, unsigned 
int cmd_in, unsigned lon
struct sg_fd *sfp = srp->parentfp;
 
sg_finish_rem_req(srp);
+   sg_remove_request(sfp, srp);
kref_put(>f_ref, sg_remove_sfp);
 }
 
@@ -1831,8 +1836,6 @@ static long sg_compat_ioctl(struct file *filp, unsigned 
int cmd_in, unsigned lon
else
sg_remove_scat(sfp, req_schp);
 
-   sg_remove_request(sfp, srp);
-
return ret;
 }
 
@@ -2177,12 +2180,17 @@ static long sg_compat_ioctl(struct file *filp, unsigned 
int cmd_in, unsigned lon
struct sg_fd *sfp = container_of(work, struct sg_fd, ew.work);
struct sg_device *sdp = sfp->parentdp;
Sg_request *srp;
+   unsigned long iflags;
 
/* Cleanup any responses which were never read(). */
+   write_lock_irqsave(>rq_list_lock, iflags);
while (!list_empty(>rq_list)) {
srp = list_first_entry(>rq_list, Sg_request, entry);
sg_finish_rem_req(srp);
+   list_del(>entry);
+   srp->parentfp = NULL;
}
+   write_unlock_irqrestore(>rq_list_lock, iflags);
 
if (sfp->reserve.bufflen > 0) {
SCSI_LOG_TIMEOUT(6, sg_printk(KERN_INFO, sdp,
-- 
1.8.5.6

[PATCHv4 3/6] sg: protect accesses to 'reserved' page array

2017-04-07 Thread Hannes Reinecke

The 'reserved' page array is used as a short-cut for mapping
data, saving us to allocate pages per request.
However, the 'reserved' array is only capable of holding one
request, so this patch introduces a mutex for protect
'sg_fd' against concurrent accesses.

Signed-off-by: Hannes Reinecke 
Reviewed-by: Johannes Thumshirn 
Tested-by: Johannes Thumshirn 
---
 drivers/scsi/sg.c | 45 +
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 92cc658..ddc1808 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -142,6 +142,7 @@
struct sg_device *parentdp; /* owning device */
wait_queue_head_t read_wait;/* queue read until command done */
rwlock_t rq_list_lock;  /* protect access to list in req_arr */
+   struct mutex f_mutex;   /* protect against changes in this fd */
int timeout;/* defaults to SG_DEFAULT_TIMEOUT  */
int timeout_user;   /* defaults to SG_DEFAULT_TIMEOUT_USER */
Sg_scatter_hold reserve;/* buffer held for this file descriptor 
*/
@@ -153,6 +154,7 @@
unsigned char next_cmd_len; /* 0: automatic, >0: use on next write() */
char keep_orphan;   /* 0 -> drop orphan (def), 1 -> keep for read() 
*/
char mmap_called;   /* 0 -> mmap() never called on this fd */
+   char res_in_use;/* 1 -> 'reserve' array in use */
struct kref f_ref;
struct execute_work ew;
 } Sg_fd;
@@ -196,7 +198,6 @@ static int sg_common_write(Sg_fd * sfp, Sg_request * srp,
 static Sg_request *sg_get_rq_mark(Sg_fd * sfp, int pack_id);
 static Sg_request *sg_add_request(Sg_fd * sfp);
 static int sg_remove_request(Sg_fd * sfp, Sg_request * srp);
-static int sg_res_in_use(Sg_fd * sfp);
 static Sg_device *sg_get_dev(int dev);
 static void sg_device_destroy(struct kref *kref);
 
@@ -612,6 +613,7 @@ static int sg_allow_access(struct file *filp, unsigned char 
*cmd)
}
buf += SZ_SG_HEADER;
__get_user(opcode, buf);
+   mutex_lock(>f_mutex);
if (sfp->next_cmd_len > 0) {
cmd_size = sfp->next_cmd_len;
sfp->next_cmd_len = 0;  /* reset so only this write() effected 
*/
@@ -620,6 +622,7 @@ static int sg_allow_access(struct file *filp, unsigned char 
*cmd)
if ((opcode >= 0xc0) && old_hdr.twelve_byte)
cmd_size = 12;
}
+   mutex_unlock(>f_mutex);
SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sdp,
"sg_write:   scsi opcode=0x%02x, cmd_size=%d\n", (int) opcode, 
cmd_size));
 /* Determine buffer size.  */
@@ -719,7 +722,7 @@ static int sg_allow_access(struct file *filp, unsigned char 
*cmd)
sg_remove_request(sfp, srp);
return -EINVAL; /* either MMAP_IO or DIRECT_IO (not 
both) */
}
-   if (sg_res_in_use(sfp)) {
+   if (sfp->res_in_use) {
sg_remove_request(sfp, srp);
return -EBUSY;  /* reserve buffer already being used */
}
@@ -953,12 +956,18 @@ static int max_sectors_bytes(struct request_queue *q)
 return -EINVAL;
val = min_t(int, val,
max_sectors_bytes(sdp->device->request_queue));
+   mutex_lock(>f_mutex);
if (val != sfp->reserve.bufflen) {
-   if (sg_res_in_use(sfp) || sfp->mmap_called)
+   if (sfp->mmap_called ||
+   sfp->res_in_use) {
+   mutex_unlock(>f_mutex);
return -EBUSY;
+   }
+
sg_remove_scat(sfp, >reserve);
sg_build_reserve(sfp, val);
}
+   mutex_unlock(>f_mutex);
return 0;
case SG_GET_RESERVED_SIZE:
val = min_t(int, sfp->reserve.bufflen,
@@ -1718,13 +1727,22 @@ static long sg_compat_ioctl(struct file *filp, unsigned 
int cmd_in, unsigned lon
md = _data;
 
if (md) {
-   if (!sg_res_in_use(sfp) && dxfer_len <= rsv_schp->bufflen)
+   mutex_lock(>f_mutex);
+   if (dxfer_len <= rsv_schp->bufflen &&
+   !sfp->res_in_use) {
+   sfp->res_in_use = 1;
sg_link_reserve(sfp, srp, dxfer_len);
-   else {
+   } else if ((hp->flags & SG_FLAG_MMAP_IO) && sfp->res_in_use) {
+   mutex_unlock(>f_mutex);
+   return -EBUSY;
+   } else {
res = sg_build_indirect(req_schp, sfp, dxfer_len);
-   if (res)
+   if (res) {
+   mutex_unlock(>f_mutex);

[PATCHv4 4/6] sg: check for valid direction before starting the request

2017-04-07 Thread Hannes Reinecke

From: Johannes Thumshirn 

Check for a valid direction before starting the request, otherwise we risk
running into an assertion in the scsi midlayer checking for vaild requests.

Signed-off-by: Johannes Thumshirn 
Link: http://www.spinics.net/lists/linux-scsi/msg104400.html
Reported-by: Dmitry Vyukov 
Signed-off-by: Hannes Reinecke 
Tested-by: Johannes Thumshirn 
---
 drivers/scsi/sg.c | 46 ++
 1 file changed, 34 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index ddc1808..5f12273 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -663,18 +663,14 @@ static int sg_allow_access(struct file *filp, unsigned 
char *cmd)
 * is a non-zero input_size, so emit a warning.
 */
if (hp->dxfer_direction == SG_DXFER_TO_FROM_DEV) {
-   static char cmd[TASK_COMM_LEN];
-   if (strcmp(current->comm, cmd)) {
-   printk_ratelimited(KERN_WARNING
-  "sg_write: data in/out %d/%d bytes "
-  "for SCSI command 0x%x-- guessing "
-  "data in;\n   program %s not setting 
"
-  "count and/or reply_len properly\n",
-  old_hdr.reply_len - 
(int)SZ_SG_HEADER,
-  input_size, (unsigned int) cmnd[0],
-  current->comm);
-   strcpy(cmd, current->comm);
-   }
+   printk_ratelimited(KERN_WARNING
+  "sg_write: data in/out %d/%d bytes "
+  "for SCSI command 0x%x-- guessing "
+  "data in;\n   program %s not setting "
+  "count and/or reply_len properly\n",
+  old_hdr.reply_len - (int)SZ_SG_HEADER,
+  input_size, (unsigned int) cmnd[0],
+  current->comm);
}
k = sg_common_write(sfp, srp, cmnd, sfp->timeout, blocking);
return (k < 0) ? k : count;
@@ -753,6 +749,29 @@ static int sg_allow_access(struct file *filp, unsigned 
char *cmd)
return count;
 }
 
+static bool sg_is_valid_dxfer(sg_io_hdr_t *hp)
+{
+   switch (hp->dxfer_direction) {
+   case SG_DXFER_NONE:
+   if (hp->dxferp || hp->dxfer_len > 0)
+   return false;
+   return true;
+   case SG_DXFER_TO_DEV:
+   case SG_DXFER_FROM_DEV:
+   case SG_DXFER_TO_FROM_DEV:
+   if (!hp->dxferp || hp->dxfer_len == 0)
+   return false;
+   return true;
+   case SG_DXFER_UNKNOWN:
+   if ((!hp->dxferp && hp->dxfer_len) ||
+   (hp->dxferp && hp->dxfer_len == 0))
+   return false;
+   return true;
+   default:
+   return false;
+   }
+}
+
 static int
 sg_common_write(Sg_fd * sfp, Sg_request * srp,
unsigned char *cmnd, int timeout, int blocking)
@@ -773,6 +792,9 @@ static int sg_allow_access(struct file *filp, unsigned char 
*cmd)
"sg_common_write:  scsi opcode=0x%02x, cmd_size=%d\n",
(int) cmnd[0], (int) hp->cmd_len));
 
+   if (!sg_is_valid_dxfer(hp))
+   return -EINVAL;
+
k = sg_start_req(srp, cmnd);
if (k) {
SCSI_LOG_TIMEOUT(1, sg_printk(KERN_INFO, sfp->parentdp,
-- 
1.8.5.6

[PATCHv4 5/6] sg: use standard lists for sg_requests

2017-04-07 Thread Hannes Reinecke

'Sg_request' is using a private list implementation; convert it
to standard lists.

Signed-off-by: Hannes Reinecke 
Reviewed-by: Johannes Thumshirn 
Tested-by: Johannes Thumshirn 
---
 drivers/scsi/sg.c | 147 ++
 1 file changed, 61 insertions(+), 86 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 5f12273..5c9f1b9 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -122,7 +122,7 @@
 struct sg_fd;
 
 typedef struct sg_request {/* SG_MAX_QUEUE requests outstanding per file */
-   struct sg_request *nextrp;  /* NULL -> tail request (slist) */
+   struct list_head entry; /* list entry */
struct sg_fd *parentfp; /* NULL -> not in use */
Sg_scatter_hold data;   /* hold buffer, perhaps scatter list */
sg_io_hdr_t header; /* scsi command+info, see  */
@@ -146,7 +146,7 @@
int timeout;/* defaults to SG_DEFAULT_TIMEOUT  */
int timeout_user;   /* defaults to SG_DEFAULT_TIMEOUT_USER */
Sg_scatter_hold reserve;/* buffer held for this file descriptor 
*/
-   Sg_request *headrp; /* head of request slist, NULL->empty */
+   struct list_head rq_list; /* head of request list */
struct fasync_struct *async_qp; /* used by asynchronous notification */
Sg_request req_arr[SG_MAX_QUEUE];   /* used as singly-linked list */
char force_packid;  /* 1 -> pack_id input to read(), 0 -> ignored */
@@ -949,7 +949,7 @@ static int max_sectors_bytes(struct request_queue *q)
if (!access_ok(VERIFY_WRITE, ip, sizeof (int)))
return -EFAULT;
read_lock_irqsave(>rq_list_lock, iflags);
-   for (srp = sfp->headrp; srp; srp = srp->nextrp) {
+   list_for_each_entry(srp, >rq_list, entry) {
if ((1 == srp->done) && (!srp->sg_io_owned)) {
read_unlock_irqrestore(>rq_list_lock,
   iflags);
@@ -962,7 +962,8 @@ static int max_sectors_bytes(struct request_queue *q)
return 0;
case SG_GET_NUM_WAITING:
read_lock_irqsave(>rq_list_lock, iflags);
-   for (val = 0, srp = sfp->headrp; srp; srp = srp->nextrp) {
+   val = 0;
+   list_for_each_entry(srp, >rq_list, entry) {
if ((1 == srp->done) && (!srp->sg_io_owned))
++val;
}
@@ -1035,35 +1036,33 @@ static int max_sectors_bytes(struct request_queue *q)
if (!rinfo)
return -ENOMEM;
read_lock_irqsave(>rq_list_lock, iflags);
-   for (srp = sfp->headrp, val = 0; val < SG_MAX_QUEUE;
-++val, srp = srp ? srp->nextrp : srp) {
+   val = 0;
+   list_for_each_entry(srp, >rq_list, entry) {
+   if (val > SG_MAX_QUEUE)
+   break;
memset([val], 0, SZ_SG_REQ_INFO);
-   if (srp) {
-   rinfo[val].req_state = srp->done + 1;
-   rinfo[val].problem =
-   srp->header.masked_status & 
-   srp->header.host_status & 
-   srp->header.driver_status;
-   if (srp->done)
-   rinfo[val].duration =
-   srp->header.duration;
-   else {
-   ms = jiffies_to_msecs(jiffies);
-   rinfo[val].duration =
-   (ms > srp->header.duration) 
?
-   (ms - srp->header.duration) 
: 0;
-   }
-   rinfo[val].orphan = srp->orphan;
-   rinfo[val].sg_io_owned =
-   srp->sg_io_owned;
-   rinfo[val].pack_id =
-   srp->header.pack_id;
-   rinfo[val].usr_ptr =
-   srp->header.usr_ptr;
+   rinfo[val].req_state = srp->done + 1;
+   rinfo[val].problem =
+   srp->header.masked_status &
+

[PATCHv4 2/6] sg: remove 'save_scat_len'

2017-04-07 Thread Hannes Reinecke

Unused.

Signed-off-by: Hannes Reinecke 
Reviewed-by: Johannes Thumshirn 
Tested-by: Johannes Thumshirn 
Reviewed-by: Christoph Hellwig 
---
 drivers/scsi/sg.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 11ca00d..92cc658 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -145,7 +145,6 @@
int timeout;/* defaults to SG_DEFAULT_TIMEOUT  */
int timeout_user;   /* defaults to SG_DEFAULT_TIMEOUT_USER */
Sg_scatter_hold reserve;/* buffer held for this file descriptor 
*/
-   unsigned save_scat_len; /* original length of trunc. scat. element */
Sg_request *headrp; /* head of request slist, NULL->empty */
struct fasync_struct *async_qp; /* used by asynchronous notification */
Sg_request req_arr[SG_MAX_QUEUE];   /* used as singly-linked list */
@@ -2014,7 +2013,6 @@ static long sg_compat_ioctl(struct file *filp, unsigned 
int cmd_in, unsigned lon
req_schp->pages = NULL;
req_schp->page_order = 0;
req_schp->sglist_len = 0;
-   sfp->save_scat_len = 0;
srp->res_used = 0;
 }
 
-- 
1.8.5.6

Re: [PATCH 1/4] mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC

2017-04-07 Thread Hillf Danton

On April 05, 2017 3:47 PM Vlastimil Babka wrote: 
> 
> The function __alloc_pages_direct_compact() sets PF_MEMALLOC to prevent
> deadlock during page migration by lock_page() (see the comment in
> __unmap_and_move()). Then it unconditionally clears the flag, which can clear 
> a
> pre-existing PF_MEMALLOC flag and result in recursive reclaim. This was not a
> problem until commit a8161d1ed609 ("mm, page_alloc: restructure direct
> compaction handling in slowpath"), because direct compation was called only
> after direct reclaim, which was skipped when PF_MEMALLOC flag was set.
> 
> Even now it's only a theoretical issue, as the new callsite of
> __alloc_pages_direct_compact() is reached only for costly orders and when
> gfp_pfmemalloc_allowed() is true, which means either __GFP_NOMEMALLOC is in
> gfp_flags or in_interrupt() is true. There is no such known context, but let's
> play it safe and make __alloc_pages_direct_compact() robust for cases where
> PF_MEMALLOC is already set.
> 
> Fixes: a8161d1ed609 ("mm, page_alloc: restructure direct compaction handling 
> in slowpath")
> Reported-by: Andrey Ryabinin 
> Signed-off-by: Vlastimil Babka 
> Cc: 
> ---
Acked-by: Hillf Danton 

>  mm/page_alloc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3589f8be53be..b84e6ffbe756 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3288,6 +3288,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned 
> int order,
>   enum compact_priority prio, enum compact_result *compact_result)
>  {
>   struct page *page;
> + unsigned int noreclaim_flag = current->flags & PF_MEMALLOC;
> 
>   if (!order)
>   return NULL;
> @@ -3295,7 +3296,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned 
> int order,
>   current->flags |= PF_MEMALLOC;
>   *compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
>   prio);
> - current->flags &= ~PF_MEMALLOC;
> + current->flags = (current->flags & ~PF_MEMALLOC) | noreclaim_flag;
> 
>   if (*compact_result <= COMPACT_INACTIVE)
>   return NULL;
> --
> 2.12.2

[PATCH v7 11/15] scsi: megaraid: Replace PCI pool old API

2017-04-07 Thread Romain Perier

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.

Signed-off-by: Romain Perier 
Reviewed-by: Peter Senna Tschudin 
Acked-by: Sumit Saxena 
---
 drivers/scsi/megaraid/megaraid_mbox.c   | 33 +++
 drivers/scsi/megaraid/megaraid_mm.c | 32 +++---
 drivers/scsi/megaraid/megaraid_sas_base.c   | 29 +++--
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 66 +
 4 files changed, 77 insertions(+), 83 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_mbox.c 
b/drivers/scsi/megaraid/megaraid_mbox.c
index f0987f2..7dfc2e2 100644
--- a/drivers/scsi/megaraid/megaraid_mbox.c
+++ b/drivers/scsi/megaraid/megaraid_mbox.c
@@ -1153,8 +1153,8 @@ megaraid_mbox_setup_dma_pools(adapter_t *adapter)
 
 
// Allocate memory for 16-bytes aligned mailboxes
-   raid_dev->mbox_pool_handle = pci_pool_create("megaraid mbox pool",
-   adapter->pdev,
+   raid_dev->mbox_pool_handle = dma_pool_create("megaraid mbox pool",
+   >pdev->dev,
sizeof(mbox64_t) + 16,
16, 0);
 
@@ -1164,7 +1164,7 @@ megaraid_mbox_setup_dma_pools(adapter_t *adapter)
 
mbox_pci_blk = raid_dev->mbox_pool;
for (i = 0; i < MBOX_MAX_SCSI_CMDS; i++) {
-   mbox_pci_blk[i].vaddr = pci_pool_alloc(
+   mbox_pci_blk[i].vaddr = dma_pool_alloc(
raid_dev->mbox_pool_handle,
GFP_KERNEL,
_pci_blk[i].dma_addr);
@@ -1181,8 +1181,8 @@ megaraid_mbox_setup_dma_pools(adapter_t *adapter)
 * share common memory pool. Passthru structures piggyback on memory
 * allocted to extended passthru since passthru is smaller of the two
 */
-   raid_dev->epthru_pool_handle = pci_pool_create("megaraid mbox pthru",
-   adapter->pdev, sizeof(mraid_epassthru_t), 128, 0);
+   raid_dev->epthru_pool_handle = dma_pool_create("megaraid mbox pthru",
+   >pdev->dev, sizeof(mraid_epassthru_t), 128, 0);
 
if (raid_dev->epthru_pool_handle == NULL) {
goto fail_setup_dma_pool;
@@ -1190,7 +1190,7 @@ megaraid_mbox_setup_dma_pools(adapter_t *adapter)
 
epthru_pci_blk = raid_dev->epthru_pool;
for (i = 0; i < MBOX_MAX_SCSI_CMDS; i++) {
-   epthru_pci_blk[i].vaddr = pci_pool_alloc(
+   epthru_pci_blk[i].vaddr = dma_pool_alloc(
raid_dev->epthru_pool_handle,
GFP_KERNEL,
_pci_blk[i].dma_addr);
@@ -1202,8 +1202,8 @@ megaraid_mbox_setup_dma_pools(adapter_t *adapter)
 
// Allocate memory for each scatter-gather list. Request for 512 bytes
// alignment for each sg list
-   raid_dev->sg_pool_handle = pci_pool_create("megaraid mbox sg",
-   adapter->pdev,
+   raid_dev->sg_pool_handle = dma_pool_create("megaraid mbox sg",
+   >pdev->dev,
sizeof(mbox_sgl64) * MBOX_MAX_SG_SIZE,
512, 0);
 
@@ -1213,7 +1213,7 @@ megaraid_mbox_setup_dma_pools(adapter_t *adapter)
 
sg_pci_blk = raid_dev->sg_pool;
for (i = 0; i < MBOX_MAX_SCSI_CMDS; i++) {
-   sg_pci_blk[i].vaddr = pci_pool_alloc(
+   sg_pci_blk[i].vaddr = dma_pool_alloc(
raid_dev->sg_pool_handle,
GFP_KERNEL,
_pci_blk[i].dma_addr);
@@ -1249,29 +1249,26 @@ megaraid_mbox_teardown_dma_pools(adapter_t *adapter)
 
sg_pci_blk = raid_dev->sg_pool;
for (i = 0; i < MBOX_MAX_SCSI_CMDS && sg_pci_blk[i].vaddr; i++) {
-   pci_pool_free(raid_dev->sg_pool_handle, sg_pci_blk[i].vaddr,
+   dma_pool_free(raid_dev->sg_pool_handle, sg_pci_blk[i].vaddr,
sg_pci_blk[i].dma_addr);
}
-   if (raid_dev->sg_pool_handle)
-   pci_pool_destroy(raid_dev->sg_pool_handle);
+   dma_pool_destroy(raid_dev->sg_pool_handle);
 
 
epthru_pci_blk = raid_dev->epthru_pool;
for (i = 0; i < MBOX_MAX_SCSI_CMDS && epthru_pci_blk[i].vaddr; i++) {
-   pci_pool_free(raid_dev->epthru_pool_handle,
+   dma_pool_free(raid_dev->epthru_pool_handle,
epthru_pci_blk[i].vaddr, epthru_pci_blk[i].dma_addr);
}
-   if (raid_dev->epthru_pool_handle)
-

[PATCH v7 14/15] scsi: pmcraid: Replace PCI pool old API

2017-04-07 Thread Romain Perier

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.

Signed-off-by: Romain Perier 
Acked-by: Peter Senna Tschudin 
Tested-by: Peter Senna Tschudin 
---
 drivers/scsi/pmcraid.c | 10 +-
 drivers/scsi/pmcraid.h |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/pmcraid.c b/drivers/scsi/pmcraid.c
index 49e70a3..0f893c4 100644
--- a/drivers/scsi/pmcraid.c
+++ b/drivers/scsi/pmcraid.c
@@ -4699,13 +4699,13 @@ pmcraid_release_control_blocks(
return;
 
for (i = 0; i < max_index; i++) {
-   pci_pool_free(pinstance->control_pool,
+   dma_pool_free(pinstance->control_pool,
  pinstance->cmd_list[i]->ioa_cb,
  pinstance->cmd_list[i]->ioa_cb_bus_addr);
pinstance->cmd_list[i]->ioa_cb = NULL;
pinstance->cmd_list[i]->ioa_cb_bus_addr = 0;
}
-   pci_pool_destroy(pinstance->control_pool);
+   dma_pool_destroy(pinstance->control_pool);
pinstance->control_pool = NULL;
 }
 
@@ -4762,8 +4762,8 @@ static int pmcraid_allocate_control_blocks(struct 
pmcraid_instance *pinstance)
pinstance->host->unique_id);
 
pinstance->control_pool =
-   pci_pool_create(pinstance->ctl_pool_name,
-   pinstance->pdev,
+   dma_pool_create(pinstance->ctl_pool_name,
+   >pdev->dev,
sizeof(struct pmcraid_control_block),
PMCRAID_IOARCB_ALIGNMENT, 0);
 
@@ -4772,7 +4772,7 @@ static int pmcraid_allocate_control_blocks(struct 
pmcraid_instance *pinstance)
 
for (i = 0; i < PMCRAID_MAX_CMD; i++) {
pinstance->cmd_list[i]->ioa_cb =
-   pci_pool_alloc(
+   dma_pool_alloc(
pinstance->control_pool,
GFP_KERNEL,
&(pinstance->cmd_list[i]->ioa_cb_bus_addr));
diff --git a/drivers/scsi/pmcraid.h b/drivers/scsi/pmcraid.h
index 568b18a..acf5a7b 100644
--- a/drivers/scsi/pmcraid.h
+++ b/drivers/scsi/pmcraid.h
@@ -755,7 +755,7 @@ struct pmcraid_instance {
 
/* structures related to command blocks */
struct kmem_cache *cmd_cachep;  /* cache for cmd blocks */
-   struct pci_pool *control_pool;  /* pool for control blocks */
+   struct dma_pool *control_pool;  /* pool for control blocks */
char   cmd_pool_name[64];   /* name of cmd cache */
char   ctl_pool_name[64];   /* name of control cache */
 
-- 
2.9.3

[PATCH v7 12/15] scsi: mpt3sas: Replace PCI pool old API

2017-04-07 Thread Romain Perier

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.

Signed-off-by: Romain Perier 
Reviewed-by: Peter Senna Tschudin 
---
 drivers/scsi/mpt3sas/mpt3sas_base.c | 73 +
 1 file changed, 34 insertions(+), 39 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c 
b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 5b7aec5..5ae1c23 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -3200,9 +3200,8 @@ _base_release_memory_pools(struct MPT3SAS_ADAPTER *ioc)
}
 
if (ioc->sense) {
-   pci_pool_free(ioc->sense_dma_pool, ioc->sense, ioc->sense_dma);
-   if (ioc->sense_dma_pool)
-   pci_pool_destroy(ioc->sense_dma_pool);
+   dma_pool_free(ioc->sense_dma_pool, ioc->sense, ioc->sense_dma);
+   dma_pool_destroy(ioc->sense_dma_pool);
dexitprintk(ioc, pr_info(MPT3SAS_FMT
"sense_pool(0x%p): free\n",
ioc->name, ioc->sense));
@@ -3210,9 +3209,8 @@ _base_release_memory_pools(struct MPT3SAS_ADAPTER *ioc)
}
 
if (ioc->reply) {
-   pci_pool_free(ioc->reply_dma_pool, ioc->reply, ioc->reply_dma);
-   if (ioc->reply_dma_pool)
-   pci_pool_destroy(ioc->reply_dma_pool);
+   dma_pool_free(ioc->reply_dma_pool, ioc->reply, ioc->reply_dma);
+   dma_pool_destroy(ioc->reply_dma_pool);
dexitprintk(ioc, pr_info(MPT3SAS_FMT
"reply_pool(0x%p): free\n",
ioc->name, ioc->reply));
@@ -3220,10 +3218,9 @@ _base_release_memory_pools(struct MPT3SAS_ADAPTER *ioc)
}
 
if (ioc->reply_free) {
-   pci_pool_free(ioc->reply_free_dma_pool, ioc->reply_free,
+   dma_pool_free(ioc->reply_free_dma_pool, ioc->reply_free,
ioc->reply_free_dma);
-   if (ioc->reply_free_dma_pool)
-   pci_pool_destroy(ioc->reply_free_dma_pool);
+   dma_pool_destroy(ioc->reply_free_dma_pool);
dexitprintk(ioc, pr_info(MPT3SAS_FMT
"reply_free_pool(0x%p): free\n",
ioc->name, ioc->reply_free));
@@ -3234,7 +3231,7 @@ _base_release_memory_pools(struct MPT3SAS_ADAPTER *ioc)
do {
rps = >reply_post[i];
if (rps->reply_post_free) {
-   pci_pool_free(
+   dma_pool_free(
ioc->reply_post_free_dma_pool,
rps->reply_post_free,
rps->reply_post_free_dma);
@@ -3246,8 +3243,7 @@ _base_release_memory_pools(struct MPT3SAS_ADAPTER *ioc)
} while (ioc->rdpq_array_enable &&
   (++i < ioc->reply_queue_count));
 
-   if (ioc->reply_post_free_dma_pool)
-   pci_pool_destroy(ioc->reply_post_free_dma_pool);
+   dma_pool_destroy(ioc->reply_post_free_dma_pool);
kfree(ioc->reply_post);
}
 
@@ -3268,12 +3264,11 @@ _base_release_memory_pools(struct MPT3SAS_ADAPTER *ioc)
if (ioc->chain_lookup) {
for (i = 0; i < ioc->chain_depth; i++) {
if (ioc->chain_lookup[i].chain_buffer)
-   pci_pool_free(ioc->chain_dma_pool,
+   dma_pool_free(ioc->chain_dma_pool,
ioc->chain_lookup[i].chain_buffer,
ioc->chain_lookup[i].chain_buffer_dma);
}
-   if (ioc->chain_dma_pool)
-   pci_pool_destroy(ioc->chain_dma_pool);
+   dma_pool_destroy(ioc->chain_dma_pool);
free_pages((ulong)ioc->chain_lookup, ioc->chain_pages);
ioc->chain_lookup = NULL;
}
@@ -3448,23 +3443,23 @@ _base_allocate_memory_pools(struct MPT3SAS_ADAPTER *ioc)
ioc->name);
goto out;
}
-   ioc->reply_post_free_dma_pool = pci_pool_create("reply_post_free pool",
-   ioc->pdev, sz, 16, 0);
+   ioc->reply_post_free_dma_pool = dma_pool_create("reply_post_free pool",
+   >pdev->dev, sz, 16, 0);
if (!ioc->reply_post_free_dma_pool) {
pr_err(MPT3SAS_FMT
-"reply_post_free pool: pci_pool_create failed\n",
+"reply_post_free pool: dma_pool_create failed\n",
 ioc->name);
goto out;
}
i = 0;
do {
ioc->reply_post[i].reply_post_free =
-   pci_pool_alloc(ioc->reply_post_free_dma_pool,
+

[PATCH v7 03/15] IB/mthca: Replace PCI pool old API

2017-04-07 Thread Romain Perier

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.

Signed-off-by: Romain Perier 
Acked-by: Peter Senna Tschudin 
Tested-by: Peter Senna Tschudin 
Acked-by: Doug Ledford 
Tested-by: Doug Ledford 
---
 drivers/infiniband/hw/mthca/mthca_av.c  | 10 +-
 drivers/infiniband/hw/mthca/mthca_cmd.c |  8 
 drivers/infiniband/hw/mthca/mthca_dev.h |  4 ++--
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_av.c 
b/drivers/infiniband/hw/mthca/mthca_av.c
index c9f0f36..9d041b6 100644
--- a/drivers/infiniband/hw/mthca/mthca_av.c
+++ b/drivers/infiniband/hw/mthca/mthca_av.c
@@ -186,7 +186,7 @@ int mthca_create_ah(struct mthca_dev *dev,
 
 on_hca_fail:
if (ah->type == MTHCA_AH_PCI_POOL) {
-   ah->av = pci_pool_zalloc(dev->av_table.pool,
+   ah->av = dma_pool_zalloc(dev->av_table.pool,
 GFP_ATOMIC, >avdma);
if (!ah->av)
return -ENOMEM;
@@ -245,7 +245,7 @@ int mthca_destroy_ah(struct mthca_dev *dev, struct mthca_ah 
*ah)
break;
 
case MTHCA_AH_PCI_POOL:
-   pci_pool_free(dev->av_table.pool, ah->av, ah->avdma);
+   dma_pool_free(dev->av_table.pool, ah->av, ah->avdma);
break;
 
case MTHCA_AH_KMALLOC:
@@ -333,7 +333,7 @@ int mthca_init_av_table(struct mthca_dev *dev)
if (err)
return err;
 
-   dev->av_table.pool = pci_pool_create("mthca_av", dev->pdev,
+   dev->av_table.pool = dma_pool_create("mthca_av", >pdev->dev,
 MTHCA_AV_SIZE,
 MTHCA_AV_SIZE, 0);
if (!dev->av_table.pool)
@@ -353,7 +353,7 @@ int mthca_init_av_table(struct mthca_dev *dev)
return 0;
 
  out_free_pool:
-   pci_pool_destroy(dev->av_table.pool);
+   dma_pool_destroy(dev->av_table.pool);
 
  out_free_alloc:
mthca_alloc_cleanup(>av_table.alloc);
@@ -367,6 +367,6 @@ void mthca_cleanup_av_table(struct mthca_dev *dev)
 
if (dev->av_table.av_map)
iounmap(dev->av_table.av_map);
-   pci_pool_destroy(dev->av_table.pool);
+   dma_pool_destroy(dev->av_table.pool);
mthca_alloc_cleanup(>av_table.alloc);
 }
diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c 
b/drivers/infiniband/hw/mthca/mthca_cmd.c
index c7f49bb..7f219c8 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -530,7 +530,7 @@ int mthca_cmd_init(struct mthca_dev *dev)
return -ENOMEM;
}
 
-   dev->cmd.pool = pci_pool_create("mthca_cmd", dev->pdev,
+   dev->cmd.pool = dma_pool_create("mthca_cmd", >pdev->dev,
MTHCA_MAILBOX_SIZE,
MTHCA_MAILBOX_SIZE, 0);
if (!dev->cmd.pool) {
@@ -543,7 +543,7 @@ int mthca_cmd_init(struct mthca_dev *dev)
 
 void mthca_cmd_cleanup(struct mthca_dev *dev)
 {
-   pci_pool_destroy(dev->cmd.pool);
+   dma_pool_destroy(dev->cmd.pool);
iounmap(dev->hcr);
if (dev->cmd.flags & MTHCA_CMD_POST_DOORBELLS)
iounmap(dev->cmd.dbell_map);
@@ -613,7 +613,7 @@ struct mthca_mailbox *mthca_alloc_mailbox(struct mthca_dev 
*dev,
if (!mailbox)
return ERR_PTR(-ENOMEM);
 
-   mailbox->buf = pci_pool_alloc(dev->cmd.pool, gfp_mask, >dma);
+   mailbox->buf = dma_pool_alloc(dev->cmd.pool, gfp_mask, >dma);
if (!mailbox->buf) {
kfree(mailbox);
return ERR_PTR(-ENOMEM);
@@ -627,7 +627,7 @@ void mthca_free_mailbox(struct mthca_dev *dev, struct 
mthca_mailbox *mailbox)
if (!mailbox)
return;
 
-   pci_pool_free(dev->cmd.pool, mailbox->buf, mailbox->dma);
+   dma_pool_free(dev->cmd.pool, mailbox->buf, mailbox->dma);
kfree(mailbox);
 }
 
diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h 
b/drivers/infiniband/hw/mthca/mthca_dev.h
index 4393a02..8c3f6ed 100644
--- a/drivers/infiniband/hw/mthca/mthca_dev.h
+++ b/drivers/infiniband/hw/mthca/mthca_dev.h
@@ -118,7 +118,7 @@ enum {
 };
 
 struct mthca_cmd {
-   struct pci_pool  *pool;
+   struct dma_pool  *pool;
struct mutex  hcr_mutex;
struct semaphore  poll_sem;
struct semaphore  event_sem;
@@ -263,7 +263,7 @@ struct mthca_qp_table {
 };
 
 struct mthca_av_table {
-   struct pci_pool   *pool;
+   struct dma_pool   *pool;
intnum_ddr_avs;
u64ddr_av_base;
void __iomem  *av_map;
-- 
2.9.3

[PATCH v7 01/15] block: DAC960: Replace PCI pool old API

2017-04-07 Thread Romain Perier

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.

Signed-off-by: Romain Perier 
Acked-by: Peter Senna Tschudin 
Tested-by: Peter Senna Tschudin 
---
 drivers/block/DAC960.c | 38 ++
 drivers/block/DAC960.h |  4 ++--
 2 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/drivers/block/DAC960.c b/drivers/block/DAC960.c
index 26a51be..a0883ad 100644
--- a/drivers/block/DAC960.c
+++ b/drivers/block/DAC960.c
@@ -268,17 +268,17 @@ static bool 
DAC960_CreateAuxiliaryStructures(DAC960_Controller_T *Controller)
   void *AllocationPointer = NULL;
   void *ScatterGatherCPU = NULL;
   dma_addr_t ScatterGatherDMA;
-  struct pci_pool *ScatterGatherPool;
+  struct dma_pool *ScatterGatherPool;
   void *RequestSenseCPU = NULL;
   dma_addr_t RequestSenseDMA;
-  struct pci_pool *RequestSensePool = NULL;
+  struct dma_pool *RequestSensePool = NULL;
 
   if (Controller->FirmwareType == DAC960_V1_Controller)
 {
   CommandAllocationLength = offsetof(DAC960_Command_T, V1.EndMarker);
   CommandAllocationGroupSize = DAC960_V1_CommandAllocationGroupSize;
-  ScatterGatherPool = pci_pool_create("DAC960_V1_ScatterGather",
-   Controller->PCIDevice,
+  ScatterGatherPool = dma_pool_create("DAC960_V1_ScatterGather",
+   >PCIDevice->dev,
DAC960_V1_ScatterGatherLimit * sizeof(DAC960_V1_ScatterGatherSegment_T),
sizeof(DAC960_V1_ScatterGatherSegment_T), 0);
   if (ScatterGatherPool == NULL)
@@ -290,18 +290,18 @@ static bool 
DAC960_CreateAuxiliaryStructures(DAC960_Controller_T *Controller)
 {
   CommandAllocationLength = offsetof(DAC960_Command_T, V2.EndMarker);
   CommandAllocationGroupSize = DAC960_V2_CommandAllocationGroupSize;
-  ScatterGatherPool = pci_pool_create("DAC960_V2_ScatterGather",
-   Controller->PCIDevice,
+  ScatterGatherPool = dma_pool_create("DAC960_V2_ScatterGather",
+   >PCIDevice->dev,
DAC960_V2_ScatterGatherLimit * sizeof(DAC960_V2_ScatterGatherSegment_T),
sizeof(DAC960_V2_ScatterGatherSegment_T), 0);
   if (ScatterGatherPool == NULL)
return DAC960_Failure(Controller,
"AUXILIARY STRUCTURE CREATION (SG)");
-  RequestSensePool = pci_pool_create("DAC960_V2_RequestSense",
-   Controller->PCIDevice, sizeof(DAC960_SCSI_RequestSense_T),
+  RequestSensePool = dma_pool_create("DAC960_V2_RequestSense",
+   >PCIDevice->dev, sizeof(DAC960_SCSI_RequestSense_T),
sizeof(int), 0);
   if (RequestSensePool == NULL) {
-   pci_pool_destroy(ScatterGatherPool);
+   dma_pool_destroy(ScatterGatherPool);
return DAC960_Failure(Controller,
"AUXILIARY STRUCTURE CREATION (SG)");
   }
@@ -335,16 +335,16 @@ static bool 
DAC960_CreateAuxiliaryStructures(DAC960_Controller_T *Controller)
   Command->Next = Controller->FreeCommands;
   Controller->FreeCommands = Command;
   Controller->Commands[CommandIdentifier-1] = Command;
-  ScatterGatherCPU = pci_pool_alloc(ScatterGatherPool, GFP_ATOMIC,
+  ScatterGatherCPU = dma_pool_alloc(ScatterGatherPool, GFP_ATOMIC,
);
   if (ScatterGatherCPU == NULL)
  return DAC960_Failure(Controller, "AUXILIARY STRUCTURE CREATION");
 
   if (RequestSensePool != NULL) {
- RequestSenseCPU = pci_pool_alloc(RequestSensePool, GFP_ATOMIC,
+ RequestSenseCPU = dma_pool_alloc(RequestSensePool, GFP_ATOMIC,
);
  if (RequestSenseCPU == NULL) {
-pci_pool_free(ScatterGatherPool, ScatterGatherCPU,
+dma_pool_free(ScatterGatherPool, ScatterGatherCPU,
 ScatterGatherDMA);
return DAC960_Failure(Controller,
"AUXILIARY STRUCTURE CREATION");
@@ -379,8 +379,8 @@ static bool 
DAC960_CreateAuxiliaryStructures(DAC960_Controller_T *Controller)
 static void DAC960_DestroyAuxiliaryStructures(DAC960_Controller_T *Controller)
 {
   int i;
-  struct pci_pool *ScatterGatherPool = Controller->ScatterGatherPool;
-  struct pci_pool *RequestSensePool = NULL;
+  struct dma_pool *ScatterGatherPool = Controller->ScatterGatherPool;
+  struct dma_pool *RequestSensePool = NULL;
   void *ScatterGatherCPU;
   dma_addr_t ScatterGatherDMA;
   void *RequestSenseCPU;
@@ -411,9 +411,9 @@ static void 
DAC960_DestroyAuxiliaryStructures(DAC960_Controller_T *Controller)
  RequestSenseDMA = Command->V2.RequestSenseDMA;
   }
   if (ScatterGatherCPU != NULL)
-  pci_pool_free(ScatterGatherPool, ScatterGatherCPU, ScatterGatherDMA);
+  dma_pool_free(ScatterGatherPool, ScatterGatherCPU, ScatterGatherDMA);

[PATCH v7 02/15] dmaengine: pch_dma: Replace PCI pool old API

2017-04-07 Thread Romain Perier

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.

Signed-off-by: Romain Perier 
Acked-by: Peter Senna Tschudin 
Tested-by: Peter Senna Tschudin 
---
 drivers/dma/pch_dma.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/pch_dma.c b/drivers/dma/pch_dma.c
index f9028e9..afd8f27 100644
--- a/drivers/dma/pch_dma.c
+++ b/drivers/dma/pch_dma.c
@@ -123,7 +123,7 @@ struct pch_dma_chan {
 struct pch_dma {
struct dma_device   dma;
void __iomem *membase;
-   struct pci_pool *pool;
+   struct dma_pool *pool;
struct pch_dma_regs regs;
struct pch_dma_desc_regs ch_regs[MAX_CHAN_NR];
struct pch_dma_chan channels[MAX_CHAN_NR];
@@ -437,7 +437,7 @@ static struct pch_dma_desc *pdc_alloc_desc(struct dma_chan 
*chan, gfp_t flags)
struct pch_dma *pd = to_pd(chan->device);
dma_addr_t addr;
 
-   desc = pci_pool_zalloc(pd->pool, flags, );
+   desc = dma_pool_zalloc(pd->pool, flags, );
if (desc) {
INIT_LIST_HEAD(>tx_list);
dma_async_tx_descriptor_init(>txd, chan);
@@ -549,7 +549,7 @@ static void pd_free_chan_resources(struct dma_chan *chan)
spin_unlock_irq(_chan->lock);
 
list_for_each_entry_safe(desc, _d, _list, desc_node)
-   pci_pool_free(pd->pool, desc, desc->txd.phys);
+   dma_pool_free(pd->pool, desc, desc->txd.phys);
 
pdc_enable_irq(chan, 0);
 }
@@ -880,7 +880,7 @@ static int pch_dma_probe(struct pci_dev *pdev,
goto err_iounmap;
}
 
-   pd->pool = pci_pool_create("pch_dma_desc_pool", pdev,
+   pd->pool = dma_pool_create("pch_dma_desc_pool", >dev,
   sizeof(struct pch_dma_desc), 4, 0);
if (!pd->pool) {
dev_err(>dev, "Failed to alloc DMA descriptors\n");
@@ -931,7 +931,7 @@ static int pch_dma_probe(struct pci_dev *pdev,
return 0;
 
 err_free_pool:
-   pci_pool_destroy(pd->pool);
+   dma_pool_destroy(pd->pool);
 err_free_irq:
free_irq(pdev->irq, pd);
 err_iounmap:
@@ -963,7 +963,7 @@ static void pch_dma_remove(struct pci_dev *pdev)
tasklet_kill(_chan->tasklet);
}
 
-   pci_pool_destroy(pd->pool);
+   dma_pool_destroy(pd->pool);
pci_iounmap(pdev, pd->membase);
pci_release_regions(pdev);
pci_disable_device(pdev);
-- 
2.9.3

[PATCH v7 04/15] net: e100: Replace PCI pool old API

2017-04-07 Thread Romain Perier

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.

Signed-off-by: Romain Perier 
Acked-by: Peter Senna Tschudin 
Acked-by: Jeff Kirsher 
Tested-by: Peter Senna Tschudin 
---
 drivers/net/ethernet/intel/e100.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e100.c 
b/drivers/net/ethernet/intel/e100.c
index 2b7323d..d1002c2 100644
--- a/drivers/net/ethernet/intel/e100.c
+++ b/drivers/net/ethernet/intel/e100.c
@@ -607,7 +607,7 @@ struct nic {
struct mem *mem;
dma_addr_t dma_addr;
 
-   struct pci_pool *cbs_pool;
+   struct dma_pool *cbs_pool;
dma_addr_t cbs_dma_addr;
u8 adaptive_ifs;
u8 tx_threshold;
@@ -1892,7 +1892,7 @@ static void e100_clean_cbs(struct nic *nic)
nic->cb_to_clean = nic->cb_to_clean->next;
nic->cbs_avail++;
}
-   pci_pool_free(nic->cbs_pool, nic->cbs, nic->cbs_dma_addr);
+   dma_pool_free(nic->cbs_pool, nic->cbs, nic->cbs_dma_addr);
nic->cbs = NULL;
nic->cbs_avail = 0;
}
@@ -1910,7 +1910,7 @@ static int e100_alloc_cbs(struct nic *nic)
nic->cb_to_use = nic->cb_to_send = nic->cb_to_clean = NULL;
nic->cbs_avail = 0;
 
-   nic->cbs = pci_pool_alloc(nic->cbs_pool, GFP_KERNEL,
+   nic->cbs = dma_pool_alloc(nic->cbs_pool, GFP_KERNEL,
  >cbs_dma_addr);
if (!nic->cbs)
return -ENOMEM;
@@ -2958,8 +2958,8 @@ static int e100_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
netif_err(nic, probe, nic->netdev, "Cannot register net device, 
aborting\n");
goto err_out_free;
}
-   nic->cbs_pool = pci_pool_create(netdev->name,
-  nic->pdev,
+   nic->cbs_pool = dma_pool_create(netdev->name,
+  >pdev->dev,
   nic->params.cbs.max * sizeof(struct cb),
   sizeof(u32),
   0);
@@ -2999,7 +2999,7 @@ static void e100_remove(struct pci_dev *pdev)
unregister_netdev(netdev);
e100_free(nic);
pci_iounmap(pdev, nic->csr);
-   pci_pool_destroy(nic->cbs_pool);
+   dma_pool_destroy(nic->cbs_pool);
free_netdev(netdev);
pci_release_regions(pdev);
pci_disable_device(pdev);
-- 
2.9.3

[PATCH v7 05/15] mlx4: Replace PCI pool old API

2017-04-07 Thread Romain Perier

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.

Signed-off-by: Romain Perier 
Acked-by: Peter Senna Tschudin 
Tested-by: Peter Senna Tschudin 
Reviewed-by: Leon Romanovsky 
Acked-by: Doug Ledford 
Tested-by: Doug Ledford 
---
 drivers/net/ethernet/mellanox/mlx4/cmd.c  | 10 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c 
b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index 0e0fa70..2d6ef79 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -2527,8 +2527,8 @@ int mlx4_cmd_init(struct mlx4_dev *dev)
}
 
if (!priv->cmd.pool) {
-   priv->cmd.pool = pci_pool_create("mlx4_cmd",
-dev->persist->pdev,
+   priv->cmd.pool = dma_pool_create("mlx4_cmd",
+>persist->pdev->dev,
 MLX4_MAILBOX_SIZE,
 MLX4_MAILBOX_SIZE, 0);
if (!priv->cmd.pool)
@@ -2599,7 +2599,7 @@ void mlx4_cmd_cleanup(struct mlx4_dev *dev, int 
cleanup_mask)
struct mlx4_priv *priv = mlx4_priv(dev);
 
if (priv->cmd.pool && (cleanup_mask & MLX4_CMD_CLEANUP_POOL)) {
-   pci_pool_destroy(priv->cmd.pool);
+   dma_pool_destroy(priv->cmd.pool);
priv->cmd.pool = NULL;
}
 
@@ -2691,7 +2691,7 @@ struct mlx4_cmd_mailbox *mlx4_alloc_cmd_mailbox(struct 
mlx4_dev *dev)
if (!mailbox)
return ERR_PTR(-ENOMEM);
 
-   mailbox->buf = pci_pool_zalloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
+   mailbox->buf = dma_pool_zalloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
   >dma);
if (!mailbox->buf) {
kfree(mailbox);
@@ -2708,7 +2708,7 @@ void mlx4_free_cmd_mailbox(struct mlx4_dev *dev,
if (!mailbox)
return;
 
-   pci_pool_free(mlx4_priv(dev)->cmd.pool, mailbox->buf, mailbox->dma);
+   dma_pool_free(mlx4_priv(dev)->cmd.pool, mailbox->buf, mailbox->dma);
kfree(mailbox);
 }
 EXPORT_SYMBOL_GPL(mlx4_free_cmd_mailbox);
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h 
b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index b4f1bc5..69c8764 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -628,7 +628,7 @@ struct mlx4_mgm {
 };
 
 struct mlx4_cmd {
-   struct pci_pool*pool;
+   struct dma_pool*pool;
void __iomem   *hcr;
struct mutexslave_cmd_mutex;
struct semaphorepoll_sem;
-- 
2.9.3

Re: [PATCH 1/4] mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC

2017-04-07 Thread Vlastimil Babka

On 04/05/2017 01:40 PM, Andrey Ryabinin wrote:
> On 04/05/2017 10:46 AM, Vlastimil Babka wrote:
>> The function __alloc_pages_direct_compact() sets PF_MEMALLOC to prevent
>> deadlock during page migration by lock_page() (see the comment in
>> __unmap_and_move()). Then it unconditionally clears the flag, which can 
>> clear a
>> pre-existing PF_MEMALLOC flag and result in recursive reclaim. This was not a
>> problem until commit a8161d1ed609 ("mm, page_alloc: restructure direct
>> compaction handling in slowpath"), because direct compation was called only
>> after direct reclaim, which was skipped when PF_MEMALLOC flag was set.
>> 
>> Even now it's only a theoretical issue, as the new callsite of
>> __alloc_pages_direct_compact() is reached only for costly orders and when
>> gfp_pfmemalloc_allowed() is true, which means either __GFP_NOMEMALLOC is in
>is false   
> 
>> gfp_flags or in_interrupt() is true. There is no such known context, but 
>> let's
>> play it safe and make __alloc_pages_direct_compact() robust for cases where
>> PF_MEMALLOC is already set.
>> 
>> Fixes: a8161d1ed609 ("mm, page_alloc: restructure direct compaction handling 
>> in slowpath")
>> Reported-by: Andrey Ryabinin 
>> Signed-off-by: Vlastimil Babka 
>> Cc: 
>> ---
>>  mm/page_alloc.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 3589f8be53be..b84e6ffbe756 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3288,6 +3288,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned 
>> int order,
>>  enum compact_priority prio, enum compact_result *compact_result)
>>  {
>>  struct page *page;
>> +unsigned int noreclaim_flag = current->flags & PF_MEMALLOC;
>>  
>>  if (!order)
>>  return NULL;
>> @@ -3295,7 +3296,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned 
>> int order,
>>  current->flags |= PF_MEMALLOC;
>>  *compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
>>  prio);
>> -current->flags &= ~PF_MEMALLOC;
>> +current->flags = (current->flags & ~PF_MEMALLOC) | noreclaim_flag;
> 
> Perhaps this would look better:
> 
>   tsk_restore_flags(current, noreclaim_flag, PF_MEMALLOC);
> 
> ?

Well, I didn't care much considering this is for stable only, and patch 2/4
rewrites this to the new api.

>>  if (*compact_result <= COMPACT_INACTIVE)
>>  return NULL;
>> 
>

[PATCHv4 0/6] sanitize sg

2017-04-07 Thread Hannes Reinecke

Hi all,

the infamous syzkaller incovered some more issues in the sg driver.
This patchset fixes those two issues (and adds a fix for yet another
potential issue; checking for a NULL dxferp when dxfer_len is not 0).
It also removes handling of the SET_FORCE_LOW_DMA ioctl, which never
worked since the initial git checkin. And does some code cleanup by
removing the private list implementation, using standard lists instead.

As usual, comments and reviews are welcome.

Changes to v1:
- Include reviews from Christoph
- Add patch to close race condition in sg_remove_sfp_usercontext()
- Remove stale variable 'save_scat_len'

Changes to v2:
- Move misplaced hunk
- Add Reviewed-by: and Tested-by: tags

Changes to v3:
- Review locking in 'sg: protect accesses ...'
- Reshuffle hunks to keep kbuild robot happy

Hannes Reinecke (5):
  sg: disable SET_FORCE_LOW_DMA
  sg: remove 'save_scat_len'
  sg: protect accesses to 'reserved' page array
  sg: use standard lists for sg_requests
  sg: close race condition in sg_remove_sfp_usercontext()

Johannes Thumshirn (1):
  sg: check for valid direction before starting the request

 drivers/scsi/sg.c | 282 +++---
 include/scsi/sg.h |   1 -
 2 files changed, 139 insertions(+), 144 deletions(-)

-- 
1.8.5.6

[PATCHv4 1/6] sg: disable SET_FORCE_LOW_DMA

2017-04-07 Thread Hannes Reinecke

The ioctl SET_FORCE_LOW_DMA has never worked since the initial
git check-in, and the respective setting is nowadays handled
correctly.
So disable it entirely.

Signed-off-by: Hannes Reinecke 
Reviewed-by: Johannes Thumshirn 
Tested-by: Johannes Thumshirn 
Reviewed-by: Christoph Hellwig 
---
 drivers/scsi/sg.c | 30 +-
 include/scsi/sg.h |  1 -
 2 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 29b8650..11ca00d 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -149,7 +149,6 @@
Sg_request *headrp; /* head of request slist, NULL->empty */
struct fasync_struct *async_qp; /* used by asynchronous notification */
Sg_request req_arr[SG_MAX_QUEUE];   /* used as singly-linked list */
-   char low_dma;   /* as in parent but possibly overridden to 1 */
char force_packid;  /* 1 -> pack_id input to read(), 0 -> ignored */
char cmd_q; /* 1 -> allow command queuing, 0 -> don't */
unsigned char next_cmd_len; /* 0: automatic, >0: use on next write() */
@@ -885,24 +884,14 @@ static int max_sectors_bytes(struct request_queue *q)
/* strange ..., for backward compatibility */
return sfp->timeout_user;
case SG_SET_FORCE_LOW_DMA:
-   result = get_user(val, ip);
-   if (result)
-   return result;
-   if (val) {
-   sfp->low_dma = 1;
-   if ((0 == sfp->low_dma) && (0 == sg_res_in_use(sfp))) {
-   val = (int) sfp->reserve.bufflen;
-   sg_remove_scat(sfp, >reserve);
-   sg_build_reserve(sfp, val);
-   }
-   } else {
-   if (atomic_read(>detaching))
-   return -ENODEV;
-   sfp->low_dma = sdp->device->host->unchecked_isa_dma;
-   }
+   /*
+* N.B. This ioctl never worked properly, but failed to
+* return an error value. So returning '0' to keep compability
+* with legacy applications.
+*/
return 0;
case SG_GET_LOW_DMA:
-   return put_user((int) sfp->low_dma, ip);
+   return put_user((int) sdp->device->host->unchecked_isa_dma, ip);
case SG_GET_SCSI_ID:
if (!access_ok(VERIFY_WRITE, p, sizeof (sg_scsi_id_t)))
return -EFAULT;
@@ -1829,6 +1818,7 @@ static long sg_compat_ioctl(struct file *filp, unsigned 
int cmd_in, unsigned lon
int sg_tablesize = sfp->parentdp->sg_tablesize;
int blk_size = buff_size, order;
gfp_t gfp_mask = GFP_ATOMIC | __GFP_COMP | __GFP_NOWARN;
+   struct sg_device *sdp = sfp->parentdp;
 
if (blk_size < 0)
return -EFAULT;
@@ -1854,7 +1844,7 @@ static long sg_compat_ioctl(struct file *filp, unsigned 
int cmd_in, unsigned lon
scatter_elem_sz_prev = num;
}
 
-   if (sfp->low_dma)
+   if (sdp->device->host->unchecked_isa_dma)
gfp_mask |= GFP_DMA;
 
if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
@@ -2140,8 +2130,6 @@ static long sg_compat_ioctl(struct file *filp, unsigned 
int cmd_in, unsigned lon
sfp->timeout = SG_DEFAULT_TIMEOUT;
sfp->timeout_user = SG_DEFAULT_TIMEOUT_USER;
sfp->force_packid = SG_DEF_FORCE_PACK_ID;
-   sfp->low_dma = (SG_DEF_FORCE_LOW_DMA == 0) ?
-   sdp->device->host->unchecked_isa_dma : 1;
sfp->cmd_q = SG_DEF_COMMAND_Q;
sfp->keep_orphan = SG_DEF_KEEP_ORPHAN;
sfp->parentdp = sdp;
@@ -2611,7 +2599,7 @@ static void sg_proc_debug_helper(struct seq_file *s, 
Sg_device * sdp)
   jiffies_to_msecs(fp->timeout),
   fp->reserve.bufflen,
   (int) fp->reserve.k_use_sg,
-  (int) fp->low_dma);
+  (int) sdp->device->host->unchecked_isa_dma);
seq_printf(s, "   cmd_q=%d f_packid=%d k_orphan=%d closed=0\n",
   (int) fp->cmd_q, (int) fp->force_packid,
   (int) fp->keep_orphan);
diff --git a/include/scsi/sg.h b/include/scsi/sg.h
index 3afec70..20bc71c 100644
--- a/include/scsi/sg.h
+++ b/include/scsi/sg.h
@@ -197,7 +197,6 @@
 #define SG_DEFAULT_RETRIES 0
 
 /* Defaults, commented if they differ from original sg driver */
-#define SG_DEF_FORCE_LOW_DMA 0  /* was 1 -> memory below 16MB on i386 */
 #define SG_DEF_FORCE_PACK_ID 0
 #define SG_DEF_KEEP_ORPHAN 0
 #define SG_DEF_RESERVED_SIZE SG_SCATTER_SZ /* load time option */
-- 
1.8.5.6

Re: [RFC 6/8] nvmet: Be careful about using iomem accesses when dealing with p2pmem

2017-04-07 Thread Stephen Bates

On 2017-04-06, 6:33 AM, "Sagi Grimberg"  wrote:

> Say it's connected via 2 legs, the bar is accessed from leg A and the
> data from the disk comes via leg B. In this case, the data is heading
> towards the p2p device via leg B (might be congested), the completion
> goes directly to the RC, and then the host issues a read from the
> bar via leg A. I don't understand what can guarantee ordering here.

> Stephen told me that this still guarantees ordering, but I honestly
> can't understand how, perhaps someone can explain to me in a simple
> way that I can understand.

Sagi

As long as legA, legB and the RC are all connected to the same switch then 
ordering will be preserved (I think many other topologies also work). Here is 
how it would work for the problem case you are concerned about (which is a read 
from the NVMe drive).

1. Disk device DMAs out the data to the p2pmem device via a string of PCIe 
MemWr TLPs.
2. Disk device writes to the completion queue (in system memory) via a MemWr 
TLP.
3. The last of the MemWrs from step 1 might have got stalled in the PCIe switch 
due to congestion but if so they are stalled in the egress path of the switch 
for the p2pmem port.
4. The RC determines the IO is complete when the TLP associated with step 2 
updates the memory associated with the CQ. It issues some operation to read the 
p2pmem.
5. Regardless of whether the MemRd TLP comes from the RC or another device 
connected to the switch it is queued in the egress queue for the p2pmem FIO 
behind the last DMA TLP (from step 1). PCIe ordering ensures that this MemRd 
cannot overtake the MemWr (Reads can never pass writes). Therefore the MemRd 
can never get to the p2pmem device until after the last DMA MemWr has.

I hope this helps!

Stephen

[PATCH v7 10/15] scsi: lpfc: Replace PCI pool old API

2017-04-07 Thread Romain Perier

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API. It also updates
some comments, accordingly.

Signed-off-by: Romain Perier 
Reviewed-by: Peter Senna Tschudin 
---
 drivers/scsi/lpfc/lpfc.h   |  14 +++---
 drivers/scsi/lpfc/lpfc_init.c  |  16 +++
 drivers/scsi/lpfc/lpfc_mem.c   | 106 -
 drivers/scsi/lpfc/lpfc_nvme.c  |   6 +--
 drivers/scsi/lpfc/lpfc_nvmet.c |   4 +-
 drivers/scsi/lpfc/lpfc_scsi.c  |  12 ++---
 6 files changed, 77 insertions(+), 81 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index 257bbdd..c6f82db 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -935,13 +935,13 @@ struct lpfc_hba {
struct list_head active_rrq_list;
spinlock_t hbalock;
 
-   /* pci_mem_pools */
-   struct pci_pool *lpfc_sg_dma_buf_pool;
-   struct pci_pool *lpfc_mbuf_pool;
-   struct pci_pool *lpfc_hrb_pool; /* header receive buffer pool */
-   struct pci_pool *lpfc_drb_pool; /* data receive buffer pool */
-   struct pci_pool *lpfc_hbq_pool; /* SLI3 hbq buffer pool */
-   struct pci_pool *txrdy_payload_pool;
+   /* dma_mem_pools */
+   struct dma_pool *lpfc_sg_dma_buf_pool;
+   struct dma_pool *lpfc_mbuf_pool;
+   struct dma_pool *lpfc_hrb_pool; /* header receive buffer pool */
+   struct dma_pool *lpfc_drb_pool; /* data receive buffer pool */
+   struct dma_pool *lpfc_hbq_pool; /* SLI3 hbq buffer pool */
+   struct dma_pool *txrdy_payload_pool;
struct lpfc_dma_pool lpfc_mbuf_safety_pool;
 
mempool_t *mbox_mem_pool;
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 6cc561b..c50b69a 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -3151,7 +3151,7 @@ lpfc_scsi_free(struct lpfc_hba *phba)
list_for_each_entry_safe(sb, sb_next, >lpfc_scsi_buf_list_put,
 list) {
list_del(>list);
-   pci_pool_free(phba->lpfc_sg_dma_buf_pool, sb->data,
+   dma_pool_free(phba->lpfc_sg_dma_buf_pool, sb->data,
  sb->dma_handle);
kfree(sb);
phba->total_scsi_bufs--;
@@ -3162,7 +3162,7 @@ lpfc_scsi_free(struct lpfc_hba *phba)
list_for_each_entry_safe(sb, sb_next, >lpfc_scsi_buf_list_get,
 list) {
list_del(>list);
-   pci_pool_free(phba->lpfc_sg_dma_buf_pool, sb->data,
+   dma_pool_free(phba->lpfc_sg_dma_buf_pool, sb->data,
  sb->dma_handle);
kfree(sb);
phba->total_scsi_bufs--;
@@ -3193,7 +3193,7 @@ lpfc_nvme_free(struct lpfc_hba *phba)
list_for_each_entry_safe(lpfc_ncmd, lpfc_ncmd_next,
 >lpfc_nvme_buf_list_put, list) {
list_del(_ncmd->list);
-   pci_pool_free(phba->lpfc_sg_dma_buf_pool, lpfc_ncmd->data,
+   dma_pool_free(phba->lpfc_sg_dma_buf_pool, lpfc_ncmd->data,
  lpfc_ncmd->dma_handle);
kfree(lpfc_ncmd);
phba->total_nvme_bufs--;
@@ -3204,7 +3204,7 @@ lpfc_nvme_free(struct lpfc_hba *phba)
list_for_each_entry_safe(lpfc_ncmd, lpfc_ncmd_next,
 >lpfc_nvme_buf_list_get, list) {
list_del(_ncmd->list);
-   pci_pool_free(phba->lpfc_sg_dma_buf_pool, lpfc_ncmd->data,
+   dma_pool_free(phba->lpfc_sg_dma_buf_pool, lpfc_ncmd->data,
  lpfc_ncmd->dma_handle);
kfree(lpfc_ncmd);
phba->total_nvme_bufs--;
@@ -3517,7 +3517,7 @@ lpfc_sli4_scsi_sgl_update(struct lpfc_hba *phba)
list_remove_head(_sgl_list, psb,
 struct lpfc_scsi_buf, list);
if (psb) {
-   pci_pool_free(phba->lpfc_sg_dma_buf_pool,
+   dma_pool_free(phba->lpfc_sg_dma_buf_pool,
  psb->data, psb->dma_handle);
kfree(psb);
}
@@ -3652,7 +3652,7 @@ lpfc_sli4_nvme_sgl_update(struct lpfc_hba *phba)
list_remove_head(_sgl_list, lpfc_ncmd,
 struct lpfc_nvme_buf, list);
if (lpfc_ncmd) {
-   pci_pool_free(phba->lpfc_sg_dma_buf_pool,
+   dma_pool_free(phba->lpfc_sg_dma_buf_pool,
  lpfc_ncmd->data,
  lpfc_ncmd->dma_handle);
kfree(lpfc_ncmd);
@@ -6680,8 +6680,8 @@ lpfc_create_shost(struct

[PATCH v7 15/15] PCI: Remove PCI pool macro functions

2017-04-07 Thread Romain Perier

Now that all the drivers use dma pool API, we can remove the macro
functions for PCI pool.

Signed-off-by: Romain Perier 
Reviewed-by: Peter Senna Tschudin 
---
 include/linux/pci.h | 9 -
 1 file changed, 9 deletions(-)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 2dbd6d2..25d6d4e 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1264,15 +1264,6 @@ int pci_set_vga_state(struct pci_dev *pdev, bool decode,
 #include 
 #include 
 
-#definepci_pool dma_pool
-#define pci_pool_create(name, pdev, size, align, allocation) \
-   dma_pool_create(name, >dev, size, align, allocation)
-#definepci_pool_destroy(pool) dma_pool_destroy(pool)
-#definepci_pool_alloc(pool, flags, handle) dma_pool_alloc(pool, flags, 
handle)
-#definepci_pool_zalloc(pool, flags, handle) \
-   dma_pool_zalloc(pool, flags, handle)
-#definepci_pool_free(pool, vaddr, addr) dma_pool_free(pool, vaddr, 
addr)
-
 struct msix_entry {
u32 vector; /* kernel uses to write allocated vector */
u16 entry;  /* driver uses to specify entry, OS writes */
-- 
2.9.3

[PATCH v7 13/15] scsi: mvsas: Replace PCI pool old API

2017-04-07 Thread Romain Perier

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.

Signed-off-by: Romain Perier 
Reviewed-by: Peter Senna Tschudin 
---
 drivers/scsi/mvsas/mv_init.c | 6 +++---
 drivers/scsi/mvsas/mv_sas.c  | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c
index 8280046..41d2276 100644
--- a/drivers/scsi/mvsas/mv_init.c
+++ b/drivers/scsi/mvsas/mv_init.c
@@ -125,8 +125,7 @@ static void mvs_free(struct mvs_info *mvi)
else
slot_nr = MVS_CHIP_SLOT_SZ;
 
-   if (mvi->dma_pool)
-   pci_pool_destroy(mvi->dma_pool);
+   dma_pool_destroy(mvi->dma_pool);
 
if (mvi->tx)
dma_free_coherent(mvi->dev,
@@ -296,7 +295,8 @@ static int mvs_alloc(struct mvs_info *mvi, struct Scsi_Host 
*shost)
goto err_out;
 
sprintf(pool_name, "%s%d", "mvs_dma_pool", mvi->id);
-   mvi->dma_pool = pci_pool_create(pool_name, mvi->pdev, MVS_SLOT_BUF_SZ, 
16, 0);
+   mvi->dma_pool = dma_pool_create(pool_name, >pdev->dev,
+   MVS_SLOT_BUF_SZ, 16, 0);
if (!mvi->dma_pool) {
printk(KERN_DEBUG "failed to create dma pool %s.\n", 
pool_name);
goto err_out;
diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
index c7cc803..ee81d10 100644
--- a/drivers/scsi/mvsas/mv_sas.c
+++ b/drivers/scsi/mvsas/mv_sas.c
@@ -790,7 +790,7 @@ static int mvs_task_prep(struct sas_task *task, struct 
mvs_info *mvi, int is_tmf
slot->n_elem = n_elem;
slot->slot_tag = tag;
 
-   slot->buf = pci_pool_alloc(mvi->dma_pool, GFP_ATOMIC, >buf_dma);
+   slot->buf = dma_pool_alloc(mvi->dma_pool, GFP_ATOMIC, >buf_dma);
if (!slot->buf) {
rc = -ENOMEM;
goto err_out_tag;
@@ -840,7 +840,7 @@ static int mvs_task_prep(struct sas_task *task, struct 
mvs_info *mvi, int is_tmf
return rc;
 
 err_out_slot_buf:
-   pci_pool_free(mvi->dma_pool, slot->buf, slot->buf_dma);
+   dma_pool_free(mvi->dma_pool, slot->buf, slot->buf_dma);
 err_out_tag:
mvs_tag_free(mvi, tag);
 err_out:
@@ -918,7 +918,7 @@ static void mvs_slot_task_free(struct mvs_info *mvi, struct 
sas_task *task,
}
 
if (slot->buf) {
-   pci_pool_free(mvi->dma_pool, slot->buf, slot->buf_dma);
+   dma_pool_free(mvi->dma_pool, slot->buf, slot->buf_dma);
slot->buf = NULL;
}
list_del_init(>entry);
-- 
2.9.3

[PATCH v7 08/15] scsi: be2iscsi: Replace PCI pool old API

2017-04-07 Thread Romain Perier

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.

Signed-off-by: Romain Perier 
Acked-by: Peter Senna Tschudin 
Tested-by: Peter Senna Tschudin 
---
 drivers/scsi/be2iscsi/be_iscsi.c | 6 +++---
 drivers/scsi/be2iscsi/be_main.c  | 6 +++---
 drivers/scsi/be2iscsi/be_main.h  | 2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/be2iscsi/be_iscsi.c b/drivers/scsi/be2iscsi/be_iscsi.c
index 97dca46..43a80ce 100644
--- a/drivers/scsi/be2iscsi/be_iscsi.c
+++ b/drivers/scsi/be2iscsi/be_iscsi.c
@@ -82,8 +82,8 @@ struct iscsi_cls_session *beiscsi_session_create(struct 
iscsi_endpoint *ep,
return NULL;
sess = cls_session->dd_data;
beiscsi_sess = sess->dd_data;
-   beiscsi_sess->bhs_pool =  pci_pool_create("beiscsi_bhs_pool",
-  phba->pcidev,
+   beiscsi_sess->bhs_pool =  dma_pool_create("beiscsi_bhs_pool",
+  >pcidev->dev,
   sizeof(struct be_cmd_bhs),
   64, 0);
if (!beiscsi_sess->bhs_pool)
@@ -108,7 +108,7 @@ void beiscsi_session_destroy(struct iscsi_cls_session 
*cls_session)
struct beiscsi_session *beiscsi_sess = sess->dd_data;
 
printk(KERN_INFO "In beiscsi_session_destroy\n");
-   pci_pool_destroy(beiscsi_sess->bhs_pool);
+   dma_pool_destroy(beiscsi_sess->bhs_pool);
iscsi_session_teardown(cls_session);
 }
 
diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c
index f862332..b4542e7 100644
--- a/drivers/scsi/be2iscsi/be_main.c
+++ b/drivers/scsi/be2iscsi/be_main.c
@@ -4257,7 +4257,7 @@ static void beiscsi_cleanup_task(struct iscsi_task *task)
pwrb_context = _ctrlr->wrb_context[cri_index];
 
if (io_task->cmd_bhs) {
-   pci_pool_free(beiscsi_sess->bhs_pool, io_task->cmd_bhs,
+   dma_pool_free(beiscsi_sess->bhs_pool, io_task->cmd_bhs,
  io_task->bhs_pa.u.a64.address);
io_task->cmd_bhs = NULL;
task->hdr = NULL;
@@ -4374,7 +4374,7 @@ static int beiscsi_alloc_pdu(struct iscsi_task *task, 
uint8_t opcode)
struct beiscsi_session *beiscsi_sess = beiscsi_conn->beiscsi_sess;
dma_addr_t paddr;
 
-   io_task->cmd_bhs = pci_pool_alloc(beiscsi_sess->bhs_pool,
+   io_task->cmd_bhs = dma_pool_alloc(beiscsi_sess->bhs_pool,
  GFP_ATOMIC, );
if (!io_task->cmd_bhs)
return -ENOMEM;
@@ -4501,7 +4501,7 @@ static int beiscsi_alloc_pdu(struct iscsi_task *task, 
uint8_t opcode)
if (io_task->pwrb_handle)
free_wrb_handle(phba, pwrb_context, io_task->pwrb_handle);
io_task->pwrb_handle = NULL;
-   pci_pool_free(beiscsi_sess->bhs_pool, io_task->cmd_bhs,
+   dma_pool_free(beiscsi_sess->bhs_pool, io_task->cmd_bhs,
  io_task->bhs_pa.u.a64.address);
io_task->cmd_bhs = NULL;
return -ENOMEM;
diff --git a/drivers/scsi/be2iscsi/be_main.h b/drivers/scsi/be2iscsi/be_main.h
index 338dbe0..81ce3ff 100644
--- a/drivers/scsi/be2iscsi/be_main.h
+++ b/drivers/scsi/be2iscsi/be_main.h
@@ -438,7 +438,7 @@ struct beiscsi_hba {
 test_bit(BEISCSI_HBA_ONLINE, >state))
 
 struct beiscsi_session {
-   struct pci_pool *bhs_pool;
+   struct dma_pool *bhs_pool;
 };
 
 /**
-- 
2.9.3

[PATCH] scsi: smartpqi: remove writeq/readq function definitions

2017-04-07 Thread Corentin Labbe

Instead of rewriting write/readq, use linux/io-64-nonatomic-lo-hi.h which
already have them.

Signed-off-by: Corentin Labbe 
---
 drivers/scsi/smartpqi/smartpqi.h | 31 ++-
 1 file changed, 2 insertions(+), 29 deletions(-)

diff --git a/drivers/scsi/smartpqi/smartpqi.h b/drivers/scsi/smartpqi/smartpqi.h
index b673825..2850ece 100644
--- a/drivers/scsi/smartpqi/smartpqi.h
+++ b/drivers/scsi/smartpqi/smartpqi.h
@@ -19,6 +19,8 @@
 #if !defined(_SMARTPQI_H)
 #define _SMARTPQI_H
 
+#include 
+
 #pragma pack(1)
 
 #define PQI_DEVICE_SIGNATURE   "PQI DREG"
@@ -1102,33 +1104,4 @@ struct pqi_scsi_dev *pqi_find_device_by_sas_rphy(
 
 extern struct sas_function_template pqi_sas_transport_functions;
 
-#if !defined(readq)
-#define readq readq
-static inline u64 readq(const volatile void __iomem *addr)
-{
-   u32 lower32;
-   u32 upper32;
-
-   lower32 = readl(addr);
-   upper32 = readl(addr + 4);
-
-   return ((u64)upper32 << 32) | lower32;
-}
-#endif
-
-#if !defined(writeq)
-#define writeq writeq
-static inline void writeq(u64 value, volatile void __iomem *addr)
-{
-   u32 lower32;
-   u32 upper32;
-
-   lower32 = lower_32_bits(value);
-   upper32 = upper_32_bits(value);
-
-   writel(lower32, addr);
-   writel(upper32, addr + 4);
-}
-#endif
-
 #endif /* _SMARTPQI_H */
-- 
2.10.2

[PATCH] scsi: libfc: directly call ELS request handlers

2017-04-07 Thread Johannes Thumshirn

Directly call ELS request handler functions in fc_lport_recv_els_req
instead of saving the pointer to the handler's receive function and then
later dereferencing this pointer.

This makes the code a bit more obvious.

Signed-off-by: Johannes Thumshirn 
---
 drivers/scsi/libfc/fc_lport.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/libfc/fc_lport.c b/drivers/scsi/libfc/fc_lport.c
index aa76f36..2fd0ec6 100644
--- a/drivers/scsi/libfc/fc_lport.c
+++ b/drivers/scsi/libfc/fc_lport.c
@@ -887,8 +887,6 @@ static void fc_lport_recv_flogi_req(struct fc_lport *lport,
 static void fc_lport_recv_els_req(struct fc_lport *lport,
  struct fc_frame *fp)
 {
-   void (*recv)(struct fc_lport *, struct fc_frame *);
-
mutex_lock(>lp_mutex);
 
/*
@@ -902,31 +900,31 @@ static void fc_lport_recv_els_req(struct fc_lport *lport,
/*
 * Check opcode.
 */
-   recv = fc_rport_recv_req;
switch (fc_frame_payload_op(fp)) {
case ELS_FLOGI:
if (!lport->point_to_multipoint)
-   recv = fc_lport_recv_flogi_req;
+   fc_lport_recv_flogi_req(lport, fp);
break;
case ELS_LOGO:
if (fc_frame_sid(fp) == FC_FID_FLOGI)
-   recv = fc_lport_recv_logo_req;
+   fc_lport_recv_logo_req(lport, fp);
break;
case ELS_RSCN:
-   recv = lport->tt.disc_recv_req;
+   lport->tt.disc_recv_req(lport, fp);
break;
case ELS_ECHO:
-   recv = fc_lport_recv_echo_req;
+   fc_lport_recv_echo_req(lport, fp);
break;
case ELS_RLIR:
-   recv = fc_lport_recv_rlir_req;
+   fc_lport_recv_rlir_req(lport, fp);
break;
case ELS_RNID:
-   recv = fc_lport_recv_rnid_req;
+   fc_lport_recv_rnid_req(lport, fp);
+   break;
+   default:
+   fc_rport_recv_req(lport, fp);
break;
}
-
-   recv(lport, fp);
}
mutex_unlock(>lp_mutex);
 }
-- 
1.8.5.6

[PATCH 03/12] hpsa: update reset handler

2017-04-07 Thread Don Brace

Use the return from TUR as a check for the
device state.

Reviewed-by: Scott Benesh 
Reviewed-by: Scott Teel 
Reviewed-by: Kevin Barnett 
Signed-off-by: Don Brace 
---
 drivers/scsi/hpsa.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 8e22aed..9fb30c4 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -3090,7 +3090,7 @@ static int hpsa_do_reset(struct ctlr_info *h, struct 
hpsa_scsi_dev_t *dev,
if (unlikely(rc))
atomic_set(>reset_cmds_out, 0);
else
-   wait_for_device_to_become_ready(h, scsi3addr, 0);
+   rc = wait_for_device_to_become_ready(h, scsi3addr, 0);
 
mutex_unlock(>reset_mutex);
return rc;

[PATCH 05/12] hpsa: rescan later if reset in progress

2017-04-07 Thread Don Brace

 - schedule another scan.
 - mark current scan as completed

Reviewed-by: Scott Benesh 
Reviewed-by: Scott Teel 
Reviewed-by: Kevin Barnett 
Signed-off-by: Don Brace 
---
 drivers/scsi/hpsa.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 2990897..53a4f34 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -5620,6 +5620,7 @@ static void hpsa_scan_start(struct Scsi_Host *sh)
 */
if (h->reset_in_progress) {
h->drv_req_rescan = 1;
+   hpsa_scan_complete(h);
return;
}

[PATCH 04/12] hpsa: do not reset enclosures

2017-04-07 Thread Don Brace

Prevent enclosure resets.

Reviewed-by: Scott Benesh 
Reviewed-by: Scott Teel 
Reviewed-by: Kevin Barnett 
Signed-off-by: Don Brace 
---
 drivers/scsi/hpsa.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 9fb30c4..2990897 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -5853,6 +5853,9 @@ static int hpsa_eh_device_reset_handler(struct scsi_cmnd 
*scsicmd)
return FAILED;
}
 
+   if (dev->devtype == TYPE_ENCLOSURE)
+   return SUCCESS;
+
/* if controller locked up, we can guarantee command won't complete */
if (lockup_detected(h)) {
snprintf(msg, sizeof(msg),

[PATCH 00/12] hpsa updates

2017-04-07 Thread Don Brace

These patches are based on Linus's tree

These patches are for:
 - Multipath failover support in general.

The changes are:
 - update identify physical device structure
   - align with FW
 - stop getting enclosure info for externals
   - no BMIC support
 - update reset handler
   - update to match out of box driver
 - do not reset enclosures
   - reset can sometimes hang
 - rescan later if reset in progress
   - wait for devices to settle.
 - correct resets on retried commands
   - was not calling scsi_done on retried completion
 - correct queue depth for externals
   - Code not in correct function
 - separate monitor events from heartbeat worker
   - allows driver to check for changes more frequently
 without affecting controller lockup detection.
 - send ioaccel requests with 0 length down raid path
   - avoid hang issues for customers running older FW.
 - remove abort handler
   - align driver with our out of box driver
 - bump driver version
   - align version with out of box driver for multi-path changes

---

Don Brace (11):
  hpsa: update identify physical device structure
  hpsa: do not get enclosure info for external devices
  hpsa: update reset handler
  hpsa: do not reset enclosures
  hpsa: rescan later if reset in progress
  hpsa: correct resets on retried commands
  hpsa: cleanup reset handler
  hpsa: correct queue depth for externals
  hpsa: send ioaccel requests with 0 length down raid path
  hpsa: remove abort handler
  hpsa: bump driver version

Scott Teel (1):
  hpsa: separate monitor events from heartbeat worker


 drivers/scsi/hpsa.c |  790 +--
 drivers/scsi/hpsa.h |3 
 drivers/scsi/hpsa_cmd.h |   20 +
 3 files changed, 164 insertions(+), 649 deletions(-)

--
Signature

[Bug 195285] New: qla2xxx FW immediatly crashing after target start

2017-04-07 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=195285

Bug ID: 195285
   Summary: qla2xxx FW immediatly crashing after target start
   Product: SCSI Drivers
   Version: 2.5
Kernel Version: 4.9.10-200.fc25.x86_64
  Hardware: x86-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: high
  Priority: P1
 Component: QLOGIC QLA2XXX
  Assignee: scsi_drivers-qla2...@kernel-bugs.osdl.org
  Reporter: anthony.blood...@gmail.com
Regression: No

System always become unresposive after target start with messages

qla2xxx [:07:00.0]-00fb:1: QLogic QLE2564 - PCI-Express Quad Channel 8Gb
Fibre Channel HBA.
qla2xxx [:07:00.0]-00fc:1: ISP2532: PCIe (5.0GT/s x8) @ :07:00.0 hdma+
host#=1 fw=8.06.00 (90d5).
qla2xxx [:07:00.1]-001a: : MSI-X vector count: 32.
qla2xxx [:07:00.1]-001d: : Found an ISP2532 irq 103 iobase
0xb830c62d5000.
qla2xxx [:07:00.1]-504b:2: RISC paused -- HCCR=40, Dumping firmware.
qla2xxx [:07:00.1]-8033:2: Unable to reinitialize FCE (258).
qla2xxx [:07:00.1]-8034:2: Unable to reinitialize EFT (258).
qla2xxx [:07:00.1]-00af:2: Performing ISP error recovery -
ha=88a2624e.
qla2xxx [:07:00.1]-504b:2: RISC paused -- HCCR=40, Dumping firmware.

trying - kernel 4.11-rc5

Apr 07 23:39:58 : [ cut here ]
Apr 07 23:39:58 : WARNING: CPU: 0 PID: 1468 at lib/dma-debug.c:519
add_dma_entry+0x176/0x180
Apr 07 23:39:58 : DMA-API: exceeded 7 overlapping mappings of cacheline
0x13e77000
Apr 07 23:39:58 : Modules linked in: vhost_net vhost tap tun ebtable_filter
ebtables ip6table_filter ip6_tables tcm_qla2xxx target_core_user uio targ
Apr 07 23:39:58 :  nvme_core scsi_transport_sas
Apr 07 23:39:58 : CPU: 0 PID: 1468 Comm: qemu-system-x86 Tainted: GW I 
   4.11.0-0.rc5.git3.1.fc27.x86_64 #1
Apr 07 23:39:58 : Hardware name: HP ProLiant DL180 G6  , BIOS O20 07/01/2013
Apr 07 23:39:58 : Call Trace:
Apr 07 23:39:58 :  dump_stack+0x8e/0xd1
Apr 07 23:39:58 :  __warn+0xcb/0xf0
Apr 07 23:39:58 :  warn_slowpath_fmt+0x5a/0x80
Apr 07 23:39:58 :  ? active_cacheline_read_overlap+0x2e/0x60
Apr 07 23:39:58 :  add_dma_entry+0x176/0x180
Apr 07 23:39:58 :  debug_dma_map_sg+0x11a/0x170
Apr 07 23:39:58 :  nvme_queue_rq+0x513/0x950 [nvme]
Apr 07 23:39:58 :  blk_mq_try_issue_directly+0xbb/0x110
Apr 07 23:39:58 :  blk_mq_make_request+0x3a9/0xa70
Apr 07 23:39:58 :  ? blk_queue_enter+0xa3/0x2c0
Apr 07 23:39:58 :  ? blk_queue_enter+0x39/0x2c0
Apr 07 23:39:58 :  ? generic_make_request+0xf9/0x3b0
Apr 07 23:39:58 :  generic_make_request+0x126/0x3b0
Apr 07 23:39:58 :  ? iov_iter_get_pages+0xc9/0x330
Apr 07 23:39:58 :  submit_bio+0x73/0x150
Apr 07 23:39:58 :  ? submit_bio+0x73/0x150
Apr 07 23:39:58 :  ? bio_iov_iter_get_pages+0xe0/0x120
Apr 07 23:39:58 :  blkdev_direct_IO+0x1f7/0x3e0
Apr 07 23:39:58 :  ? SYSC_io_destroy+0x1d0/0x1d0
Apr 07 23:39:58 :  ? __atime_needs_update+0x7f/0x1a0
Apr 07 23:39:58 :  generic_file_read_iter+0x2e5/0xad0
Apr 07 23:39:58 :  ? generic_file_read_iter+0x2e5/0xad0
Apr 07 23:39:58 :  ? rw_copy_check_uvector+0x8a/0x180
Apr 07 23:39:58 :  blkdev_read_iter+0x35/0x40
Apr 07 23:39:58 :  aio_read+0xeb/0x150
Apr 07 23:39:58 :  ? sched_clock+0x9/0x10
Apr 07 23:39:58 :  ? sched_clock_cpu+0x11/0xc0
Apr 07 23:39:58 :  ? __might_fault+0x3e/0x90
Apr 07 23:39:58 :  ? __might_fault+0x3e/0x90
Apr 07 23:39:58 :  do_io_submit+0x5f8/0x920
Apr 07 23:39:58 :  ? do_io_submit+0x5f8/0x920
Apr 07 23:39:58 :  SyS_io_submit+0x10/0x20
Apr 07 23:39:58 :  ? SyS_io_submit+0x10/0x20
Apr 07 23:39:58 :  entry_SYSCALL_64_fastpath+0x1f/0xc2
Apr 07 23:39:58 : RIP: 0033:0x7f73766216a7
Apr 07 23:39:58 : RSP: 002b:7ffc9aac6108 EFLAGS: 0246 ORIG_RAX:
00d1
Apr 07 23:39:58 : RAX: ffda RBX: 55617d90b900 RCX:
7f73766216a7
Apr 07 23:39:58 : RDX: 7ffc9aac6120 RSI: 0002 RDI:
7f737780
Apr 07 23:39:58 : RBP: 0258 R08: 7ffc9aac6440 R09:
55617d9a2000
Apr 07 23:39:58 : R10: 556188f93cf0 R11: 0246 R12:
0280
Apr 07 23:39:58 : R13: 0130 R14: 0001 R15:
0011
Apr 07 23:39:58 : ---[ end trace 81f169903702b67d ]---

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH 1/2] block: Implement global tagset

2017-04-07 Thread Arun Easi

On Thu, 6 Apr 2017, 1:49am, Hannes Reinecke wrote:

> On 04/06/2017 08:27 AM, Arun Easi wrote:
> > Hi Hannes,
> > 
> > Thanks for taking a crack at the issue. My comments below..
> > 
> > On Tue, 4 Apr 2017, 5:07am, Hannes Reinecke wrote:
> > 
> >> Most legacy HBAs have a tagset per HBA, not per queue. To map
> >> these devices onto block-mq this patch implements a new tagset
> >> flag BLK_MQ_F_GLOBAL_TAGS, which will cause the tag allocator
> >> to use just one tagset for all hardware queues.
> >>
> >> Signed-off-by: Hannes Reinecke 
> >> ---
> >>  block/blk-mq-tag.c | 12 
> >>  block/blk-mq.c | 10 --
> >>  include/linux/blk-mq.h |  1 +
> >>  3 files changed, 17 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> >> index e48bc2c..a14e76c 100644
> >> --- a/block/blk-mq-tag.c
> >> +++ b/block/blk-mq-tag.c
> >> @@ -276,9 +276,11 @@ static void blk_mq_all_tag_busy_iter(struct 
> >> blk_mq_tags *tags,
> >>  void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
> >>busy_tag_iter_fn *fn, void *priv)
> >>  {
> >> -  int i;
> >> +  int i, lim = tagset->nr_hw_queues;
> >>  
> >> -  for (i = 0; i < tagset->nr_hw_queues; i++) {
> >> +  if (tagset->flags & BLK_MQ_F_GLOBAL_TAGS)
> >> +  lim = 1;
> >> +  for (i = 0; i < lim; i++) {
> >>if (tagset->tags && tagset->tags[i])
> >>blk_mq_all_tag_busy_iter(tagset->tags[i], fn, priv);
> >>}
> >> @@ -287,12 +289,14 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set 
> >> *tagset,
> >>  
> >>  int blk_mq_reinit_tagset(struct blk_mq_tag_set *set)
> >>  {
> >> -  int i, j, ret = 0;
> >> +  int i, j, ret = 0, lim = set->nr_hw_queues;
> >>  
> >>if (!set->ops->reinit_request)
> >>goto out;
> >>  
> >> -  for (i = 0; i < set->nr_hw_queues; i++) {
> >> +  if (set->flags & BLK_MQ_F_GLOBAL_TAGS)
> >> +  lim = 1;
> >> +  for (i = 0; i < lim; i++) {
> >>struct blk_mq_tags *tags = set->tags[i];
> >>  
> >>for (j = 0; j < tags->nr_tags; j++) {
> >> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >> index 159187a..db96ed0 100644
> >> --- a/block/blk-mq.c
> >> +++ b/block/blk-mq.c
> >> @@ -2061,6 +2061,10 @@ static bool __blk_mq_alloc_rq_map(struct 
> >> blk_mq_tag_set *set, int hctx_idx)
> >>  {
> >>int ret = 0;
> >>  
> >> +  if ((set->flags & BLK_MQ_F_GLOBAL_TAGS) && hctx_idx != 0) {
> >> +  set->tags[hctx_idx] = set->tags[0];
> >> +  return true;
> >> +  }
> > 
:
> 
> > BTW, if you would like me to try out this patch on my setup, please let me 
> > know.
> > 
> Oh, yes. Please do.
> 

Ran the tests on my setup (Dell R730, 2 Node). This change did not drop 
any IOPs (got ~2M 512b). The cache miss percentage was varying based on if 
the tests were running on one node or both (latter yperformed worse). All 
interrupts were directed to only 1 node. Interestingly, the cache miss 
percentage was lowest when MQ was off.

I hit a fdisk hang (open path), btw, not sure if it has anything todo with 
this change, though.

Notes and hang stack attached.

Let me know if you are interested in any specific perf event/command-line.

Regards,
-Arunperf stat, ran on a short 10 second load.

---1port-1node-new-mq
 Performance counter stats for 'CPU(s) 2':

 188,642,696  LLC-loads(66.66%)
   3,615,142  LLC-load-misses  #1.92% of all LL-cache hits (66.67%)
  86,488,341  LLC-stores   (33.34%)
  10,820,977  LLC-store-misses (33.33%)
 391,370,104  cache-references (49.99%)
  14,498,491  cache-misses #3.705 % of all cache refs  (66.66%)

---1port-1node-mq---
 Performance counter stats for 'CPU(s) 2':

 145,025,999  LLC-loads(66.67%)
   3,793,427  LLC-load-misses  #2.62% of all LL-cache hits (66.67%)
  60,878,939  LLC-stores   (33.33%)
   8,044,714  LLC-store-misses (33.33%)
 294,713,070  cache-references (50.00%)
  11,923,354  cache-misses #4.046 % of all cache refs  (66.66%)

---1port-1node-nomq---
 Performance counter stats for 'CPU(s) 2':

 157,375,709  LLC-loads(66.66%)
 476,117  LLC-load-misses  #0.30% of all LL-cache hits (66.66%)
  76,046,098  LLC-stores   (33.34%)
 840,756  LLC-store-misses (33.34%)
 326,230,969  cache-references (50.00%)
   1,332,398  cache-misses #0.408 % of all cache refs  (66.67%)

==

--2port-allnodes-new-mq--
 Performance counter stats for

Re: kill req->errors

2017-04-07 Thread Christoph Hellwig

On Thu, Apr 06, 2017 at 04:00:24PM -0400, Konrad Rzeszutek Wilk wrote:
> You wouldn't have a git tree to easily test it? Thanks.

git://git.infradead.org/users/hch/block.git request-errors

Gitweb:

http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/request-errors

Re: [PATCH] scsi: libfc: directly call ELS request handlers

2017-04-07 Thread Chad Dupuis



On Fri, 7 Apr 2017, 1:42pm -, Johannes Thumshirn wrote:

> Directly call ELS request handler functions in fc_lport_recv_els_req
> instead of saving the pointer to the handler's receive function and then
> later dereferencing this pointer.
> 
> This makes the code a bit more obvious.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/scsi/libfc/fc_lport.c | 20 +---
>  1 file changed, 9 insertions(+), 11 deletions(-)
> 

A reasonable refactoring.

Reviewed-by: Chad Dupuis

[PATCH] scsi: mpt3sas: remove redundant wmb on arm/arm64

2017-04-07 Thread Sinan Kaya

Due to relaxed ordering requirements on multiple architectures,
drivers are required to use wmb/rmb/mb combinations when they
need to guarantee observability between the memory and the HW.

The mpt3sas driver is already using wmb() for this purpose.
However, it issues a writel following wmb(). writel() function
on arm/arm64 arhictectures have an embedded wmb() call inside.

This results in unnecessary performance loss and code duplication.

The kernel has been updated to support relaxed read/write
API to be supported across all architectures now.

The right thing was to either call __raw_writel/__raw_readl or
write_relaxed/read_relaxed for multi-arch compatibility.

Signed-off-by: Sinan Kaya 
---
 drivers/scsi/mpt3sas/mpt3sas_base.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c 
b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 5b7aec5..6e42036 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -1026,8 +1026,8 @@ static int mpt3sas_remove_dead_ioc_func(void *arg)
ioc->reply_free[ioc->reply_free_host_index] =
cpu_to_le32(reply);
wmb();
-   writel(ioc->reply_free_host_index,
-   >chip->ReplyFreeHostIndex);
+   writel_relaxed(ioc->reply_free_host_index,
+  >chip->ReplyFreeHostIndex);
}
}
 
@@ -1076,8 +1076,8 @@ static int mpt3sas_remove_dead_ioc_func(void *arg)
 
wmb();
if (ioc->is_warpdrive) {
-   writel(reply_q->reply_post_host_index,
-   ioc->reply_post_host_index[msix_index]);
+   writel_relaxed(reply_q->reply_post_host_index,
+  ioc->reply_post_host_index[msix_index]);
atomic_dec(_q->busy);
return IRQ_HANDLED;
}
@@ -1098,13 +1098,14 @@ static int mpt3sas_remove_dead_ioc_func(void *arg)
 * value in MSIxIndex field.
 */
if (ioc->combined_reply_queue)
-   writel(reply_q->reply_post_host_index | ((msix_index  & 7) <<
-   MPI2_RPHI_MSIX_INDEX_SHIFT),
-   ioc->replyPostRegisterIndex[msix_index/8]);
+   writel_relaxed(reply_q->reply_post_host_index |
+  ((msix_index  & 7) <<
+  MPI2_RPHI_MSIX_INDEX_SHIFT),
+  ioc->replyPostRegisterIndex[msix_index/8]);
else
-   writel(reply_q->reply_post_host_index | (msix_index <<
-   MPI2_RPHI_MSIX_INDEX_SHIFT),
-   >chip->ReplyPostHostIndex);
+   writel_relaxed(reply_q->reply_post_host_index |
+  (msix_index << MPI2_RPHI_MSIX_INDEX_SHIFT),
+  >chip->ReplyPostHostIndex);
atomic_dec(_q->busy);
return IRQ_HANDLED;
 }
-- 
1.9.1

Re: linux-next: manual merge of the scsi-mkp tree with the char-misc tree

2017-04-07 Thread Bart Van Assche

On 04/06/2017 10:33 PM, Stephen Rothwell wrote:
> Hi Martin,
> 
> Today's linux-next merge of the scsi-mkp tree got a conflict in:
> 
>   drivers/scsi/osd/osd_uld.c
> 
> between commit:
> 
>   ac1ddc584e98 ("scsi: utilize new cdev_device_add helper function")
> 
> from the char-misc tree and commit:
> 
>   c02465fa13b6 ("scsi: osd_uld: Check scsi_device_get() return value")
> 
> from the scsi-mkp tree.
> 
> I am not sure how to resolve this, so I have just effectively recerted
> the latter commit fo today.  Better suggestions welcome.
> 
> I fixed it up and can carry the fix as necessary. This is now fixed as
> far as linux-next is concerned, but any non trivial conflicts should be
> mentioned to your upstream maintainer when your tree is submitted for
> merging.  You may also want to consider cooperating with the maintainer
> of the conflicting tree to minimise any particularly complex conflicts.

(+linux-scsi)

Hello Martin,

Sorry that I had not yet noticed Logan's patch series. Should my two
patches that conflict with Logan's patch series be dropped and reworked
after Logan's patches are upstream?

Thanks,

Bart.

[PATCH] ibmvscsis: Do not send aborted task response

2017-04-07 Thread Bryant G. Ly

The driver is sending a response to the aborted task response
along with LIO sending the tmr response. SCSI spec says
that the initiator that sends the abort task TM NEVER gets a
response to the aborted op and with the current code it will
send a response. Thus this fix will remove that response if the
op is aborted.

Cc:  # v4.8+
Signed-off-by: Bryant G. Ly 
---
 drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 60 +---
 drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.h |  1 +
 2 files changed, 40 insertions(+), 21 deletions(-)

diff --git a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c 
b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
index 4bb5635..8e2733f 100644
--- a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
+++ b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
@@ -1169,6 +1169,7 @@ static struct ibmvscsis_cmd 
*ibmvscsis_get_free_cmd(struct scsi_info *vscsi)
cmd = list_first_entry_or_null(>free_cmd,
   struct ibmvscsis_cmd, list);
if (cmd) {
+   cmd->flags &= ~(CMD_ABORTED);
list_del(>list);
cmd->iue = iue;
cmd->type = UNSET_TYPE;
@@ -1758,33 +1759,41 @@ static void ibmvscsis_send_messages(struct scsi_info 
*vscsi)
 
if (!(vscsi->flags & RESPONSE_Q_DOWN)) {
list_for_each_entry_safe(cmd, nxt, >waiting_rsp, list) {
-   iue = cmd->iue;
+   /*
+* If an Abort flag is set then dont send response
+*/
+   if (cmd->flags & CMD_ABORTED) {
+   list_del(>list);
+   ibmvscsis_free_cmd_resources(vscsi, cmd);
+   } else {
+   iue = cmd->iue;
 
-   crq->valid = VALID_CMD_RESP_EL;
-   crq->format = cmd->rsp.format;
+   crq->valid = VALID_CMD_RESP_EL;
+   crq->format = cmd->rsp.format;
 
-   if (cmd->flags & CMD_FAST_FAIL)
-   crq->status = VIOSRP_ADAPTER_FAIL;
+   if (cmd->flags & CMD_FAST_FAIL)
+   crq->status = VIOSRP_ADAPTER_FAIL;
 
-   crq->IU_length = cpu_to_be16(cmd->rsp.len);
+   crq->IU_length = cpu_to_be16(cmd->rsp.len);
 
-   rc = h_send_crq(vscsi->dma_dev->unit_address,
-   be64_to_cpu(msg_hi),
-   be64_to_cpu(cmd->rsp.tag));
+   rc = h_send_crq(vscsi->dma_dev->unit_address,
+   be64_to_cpu(msg_hi),
+   be64_to_cpu(cmd->rsp.tag));
 
-   pr_debug("send_messages: cmd %p, tag 0x%llx, rc %ld\n",
-cmd, be64_to_cpu(cmd->rsp.tag), rc);
+   pr_debug("send_messages: cmd %p, tag 0x%llx, rc 
%ld\n",
+cmd, be64_to_cpu(cmd->rsp.tag), rc);
 
-   /* if all ok free up the command element resources */
-   if (rc == H_SUCCESS) {
-   /* some movement has occurred */
-   vscsi->rsp_q_timer.timer_pops = 0;
-   list_del(>list);
+   /* if all ok free up the command element 
resources */
+   if (rc == H_SUCCESS) {
+   /* some movement has occurred */
+   vscsi->rsp_q_timer.timer_pops = 0;
+   list_del(>list);
 
-   ibmvscsis_free_cmd_resources(vscsi, cmd);
-   } else {
-   srp_snd_msg_failed(vscsi, rc);
-   break;
+   ibmvscsis_free_cmd_resources(vscsi, 
cmd);
+   } else {
+   srp_snd_msg_failed(vscsi, rc);
+   break;
+   }
}
}
 
@@ -3581,9 +3590,15 @@ static int ibmvscsis_write_pending(struct se_cmd *se_cmd)
 {
struct ibmvscsis_cmd *cmd = container_of(se_cmd, struct ibmvscsis_cmd,
 se_cmd);
+   struct scsi_info *vscsi = cmd->adapter;
struct iu_entry *iue = cmd->iue;
int rc;
 
+   if ((vscsi->flags & (CLIENT_FAILED | RESPONSE_Q_DOWN))) {
+   pr_err("write_pending failed since: %d\n", vscsi->flags);
+   return -EIO;
+   }
+

Re: [PATCH] ibmvscsis: Do not send aborted task response

2017-04-07 Thread Bart Van Assche

On Fri, 2017-04-07 at 11:12 -0500, Bryant G. Ly wrote:
> So from this stack trace it looks like the ibmvscsis driver is sending an
> extra response through send_messages even though an abort was issued.  
> IBMi, reported this issue internally when they were testing the driver,
> because their client didn't like getting a response back for the aborted op.
> They are only expecting a TMR from the abort request, NOT the aborted op. 

Hello Bryant,

What is the root cause of this behavior? Why is it that the behavior of
the ibmvscsi_tgt contradicts what has been implemented in the LIO core?
Sorry but the patch at the start of this thread looks to me like an
attempt to paper over the problem instead of addressing the root cause.

Thanks,

Bart.

Re: [PATCH] scsi: mpt3sas: remove redundant wmb on arm/arm64

2017-04-07 Thread Sinan Kaya

On 4/7/2017 12:41 PM, Sinan Kaya wrote:
> The right thing was to either call __raw_writel/__raw_readl or
> write_relaxed/read_relaxed for multi-arch compatibility.

One can also argue to get rid of wmb(). I can go either way based
on the recommendation.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically

2017-04-07 Thread Bart Van Assche

On Fri, 2017-04-07 at 12:23 -0600, Jens Axboe wrote:
> On 04/07/2017 12:16 PM, Bart Van Assche wrote:
> > Hello Jens,
> > 
> > The six patches in this patch series fix the queue lockup I reported
> > recently on the linux-block mailing list. Please consider these patches
> > for inclusion in the upstream kernel.
> 
> Some of this we need in 4.11, but not all of it. I can't be applying patches
> that "improve scalability" at this point.
> 
> 4-6 looks like what we want for 4.11, I'll see if those apply directly. Then
> we can put 1-3 on top in 4.12, with the others pulled in first.

Hello Jens,

Please note that patch 2/6 is a bug fix. The current implementation of
blk_mq_sched_restart_queues() only considers hardware queues associated with
the same request queue as the hardware queue that has been passed as an
argument. If a tag set is shared across request queues - as is the case for
SCSI - then all request queues that share a tag set with the hctx argument
must be considered.

Bart.

Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically

2017-04-07 Thread Bart Van Assche

On Fri, 2017-04-07 at 11:33 -0700, Bart Van Assche wrote:
> On Fri, 2017-04-07 at 12:23 -0600, Jens Axboe wrote:
> > On 04/07/2017 12:16 PM, Bart Van Assche wrote:
> > > Hello Jens,
> > > 
> > > The six patches in this patch series fix the queue lockup I reported
> > > recently on the linux-block mailing list. Please consider these patches
> > > for inclusion in the upstream kernel.
> > 
> > Some of this we need in 4.11, but not all of it. I can't be applying patches
> > that "improve scalability" at this point.
> > 
> > 4-6 looks like what we want for 4.11, I'll see if those apply directly. Then
> > we can put 1-3 on top in 4.12, with the others pulled in first.
> 
> Hello Jens,
> 
> Please note that patch 2/6 is a bug fix. The current implementation of
> blk_mq_sched_restart_queues() only considers hardware queues associated with
> the same request queue as the hardware queue that has been passed as an
> argument. If a tag set is shared across request queues - as is the case for
> SCSI - then all request queues that share a tag set with the hctx argument
> must be considered.

(replying to my own e-mail)

Hello Jens,

If you want I can split that patch into two patches - one that runs all hardware
queues with which the tag set is shared and one that switches from rerunning
all hardware queues to one hardware queue.

Bart.

Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically

2017-04-07 Thread Jens Axboe

On 04/07/2017 12:39 PM, Bart Van Assche wrote:
> On Fri, 2017-04-07 at 11:33 -0700, Bart Van Assche wrote:
>> On Fri, 2017-04-07 at 12:23 -0600, Jens Axboe wrote:
>>> On 04/07/2017 12:16 PM, Bart Van Assche wrote:
 Hello Jens,

 The six patches in this patch series fix the queue lockup I reported
 recently on the linux-block mailing list. Please consider these patches
 for inclusion in the upstream kernel.
>>>
>>> Some of this we need in 4.11, but not all of it. I can't be applying patches
>>> that "improve scalability" at this point.
>>>
>>> 4-6 looks like what we want for 4.11, I'll see if those apply directly. Then
>>> we can put 1-3 on top in 4.12, with the others pulled in first.
>>
>> Hello Jens,
>>
>> Please note that patch 2/6 is a bug fix. The current implementation of
>> blk_mq_sched_restart_queues() only considers hardware queues associated with
>> the same request queue as the hardware queue that has been passed as an
>> argument. If a tag set is shared across request queues - as is the case for
>> SCSI - then all request queues that share a tag set with the hctx argument
>> must be considered.
> 
> (replying to my own e-mail)
> 
> Hello Jens,
> 
> If you want I can split that patch into two patches - one that runs all 
> hardware
> queues with which the tag set is shared and one that switches from rerunning
> all hardware queues to one hardware queue.

I already put it in, but this is getting very awkward. We're at -rc5 time, 
patches
going into mainline should be TINY. And now I'm sitting on this, that I have to
justify:

 15 files changed, 281 insertions(+), 164 deletions(-)

and where one of the patches reads like it's a performance improvement, when
in reality it's fixing a hang. So yes, the patch should have been split in
two, and the series should have been ordered so that the first patches could
go into 4.11, and the rest on top of that in 4.12. Did we really need a
patch clarifying comments in that series? Probably not.

-- 
Jens Axboe

Re: [PATCH] scsi: mpt3sas: remove redundant wmb on arm/arm64

2017-04-07 Thread Sinan Kaya

On 4/7/2017 1:25 PM, James Bottomley wrote:
>> The right thing was to either call __raw_writel/__raw_readl or
>> write_relaxed/read_relaxed for multi-arch compatibility.
> writeX_relaxed and thus your patch is definitely wrong.  The reason is
> that we have two ordering domains: the CPU and the Bus.  wmb forces
> ordering in the CPU domain but not the bus domain.  writeX originally
> forced ordering in the bus domain but not the CPU domain, but since the
> raw primitives I think it now orders in both and writeX_relaxed orders
> in neither domain, so your patch would currently eliminate the bus
> ordering.

Yeah, that's why I recommended to remove the wmb() with a follow up
instead of using the relaxed with a follow up.

writel already guarantees ordering for both cpu and bus. we don't need
additional wmb()

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

[PATCH v4 2/6] blk-mq: Restart a single queue if tag sets are shared

2017-04-07 Thread Bart Van Assche

To improve scalability, if hardware queues are shared, restart
a single hardware queue in round-robin fashion. Rename
blk_mq_sched_restart_queues() to reflect the new semantics.
Remove blk_mq_sched_mark_restart_queue() because this function
has no callers. Remove flag QUEUE_FLAG_RESTART because this
patch removes the code that uses this flag.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
---
 block/blk-mq-sched.c   | 63 ++
 block/blk-mq-sched.h   | 16 +
 block/blk-mq.c |  2 +-
 include/linux/blkdev.h |  1 -
 4 files changed, 55 insertions(+), 27 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 09af8ff18719..a5c683a6429c 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -317,25 +317,68 @@ static bool blk_mq_sched_bypass_insert(struct 
blk_mq_hw_ctx *hctx,
return true;
 }
 
-static void blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
+static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
 {
if (test_bit(BLK_MQ_S_SCHED_RESTART, >state)) {
clear_bit(BLK_MQ_S_SCHED_RESTART, >state);
-   if (blk_mq_hctx_has_pending(hctx))
+   if (blk_mq_hctx_has_pending(hctx)) {
blk_mq_run_hw_queue(hctx, true);
+   return true;
+   }
}
+   return false;
 }
 
-void blk_mq_sched_restart_queues(struct blk_mq_hw_ctx *hctx)
-{
-   struct request_queue *q = hctx->queue;
-   unsigned int i;
+/**
+ * list_for_each_entry_rcu_rr - iterate in a round-robin fashion over rcu list
+ * @pos:loop cursor.
+ * @skip:   the list element that will not be examined. Iteration starts at
+ *  @skip->next.
+ * @head:   head of the list to examine. This list must have at least one
+ *  element, namely @skip.
+ * @member: name of the list_head structure within typeof(*pos).
+ */
+#define list_for_each_entry_rcu_rr(pos, skip, head, member)\
+   for ((pos) = (skip);\
+(pos = (pos)->member.next != (head) ? list_entry_rcu(  \
+   (pos)->member.next, typeof(*pos), member) : \
+ list_entry_rcu((pos)->member.next->next, typeof(*pos), member)), \
+(pos) != (skip); )
 
-   if (test_bit(QUEUE_FLAG_RESTART, >queue_flags)) {
-   if (test_and_clear_bit(QUEUE_FLAG_RESTART, >queue_flags)) {
-   queue_for_each_hw_ctx(q, hctx, i)
-   blk_mq_sched_restart_hctx(hctx);
+/*
+ * Called after a driver tag has been freed to check whether a hctx needs to
+ * be restarted. Restarts @hctx if its tag set is not shared. Restarts hardware
+ * queues in a round-robin fashion if the tag set of @hctx is shared with other
+ * hardware queues.
+ */
+void blk_mq_sched_restart(struct blk_mq_hw_ctx *const hctx)
+{
+   struct blk_mq_tags *const tags = hctx->tags;
+   struct blk_mq_tag_set *const set = hctx->queue->tag_set;
+   struct request_queue *const queue = hctx->queue, *q;
+   struct blk_mq_hw_ctx *hctx2;
+   unsigned int i, j;
+
+   if (set->flags & BLK_MQ_F_TAG_SHARED) {
+   rcu_read_lock();
+   list_for_each_entry_rcu_rr(q, queue, >tag_list,
+  tag_set_list) {
+   queue_for_each_hw_ctx(q, hctx2, i)
+   if (hctx2->tags == tags &&
+   blk_mq_sched_restart_hctx(hctx2))
+   goto done;
+   }
+   j = hctx->queue_num + 1;
+   for (i = 0; i < queue->nr_hw_queues; i++, j++) {
+   if (j == queue->nr_hw_queues)
+   j = 0;
+   hctx2 = queue->queue_hw_ctx[j];
+   if (hctx2->tags == tags &&
+   blk_mq_sched_restart_hctx(hctx2))
+   break;
}
+done:
+   rcu_read_unlock();
} else {
blk_mq_sched_restart_hctx(hctx);
}
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index a75b16b123f7..4e3fc2a40207 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -19,7 +19,7 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct 
bio *bio,
struct request **merged_request);
 bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio);
 bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request 
*rq);
-void blk_mq_sched_restart_queues(struct blk_mq_hw_ctx *hctx);
+void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
 
 void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 bool run_queue, bool async, bool

[PATCH v4 1/6] blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list

2017-04-07 Thread Bart Van Assche

Since the next patch in this series will use RCU to iterate over
tag_list, make this safe. Add lockdep_assert_held() statements
in functions that iterate over tag_list to make clear that using
list_for_each_entry() instead of list_for_each_entry_rcu() is
fine in these functions.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
---
 block/blk-mq.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f7cd3208bcdf..c26464f9649a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2076,6 +2076,8 @@ static void blk_mq_update_tag_set_depth(struct 
blk_mq_tag_set *set, bool shared)
 {
struct request_queue *q;
 
+   lockdep_assert_held(>tag_list_lock);
+
list_for_each_entry(q, >tag_list, tag_set_list) {
blk_mq_freeze_queue(q);
queue_set_hctx_shared(q, shared);
@@ -2088,7 +2090,8 @@ static void blk_mq_del_queue_tag_set(struct request_queue 
*q)
struct blk_mq_tag_set *set = q->tag_set;
 
mutex_lock(>tag_list_lock);
-   list_del_init(>tag_set_list);
+   list_del_rcu(>tag_set_list);
+   INIT_LIST_HEAD(>tag_set_list);
if (list_is_singular(>tag_list)) {
/* just transitioned to unshared */
set->flags &= ~BLK_MQ_F_TAG_SHARED;
@@ -2096,6 +2099,8 @@ static void blk_mq_del_queue_tag_set(struct request_queue 
*q)
blk_mq_update_tag_set_depth(set, false);
}
mutex_unlock(>tag_list_lock);
+
+   synchronize_rcu();
 }
 
 static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
@@ -2113,7 +2118,7 @@ static void blk_mq_add_queue_tag_set(struct 
blk_mq_tag_set *set,
}
if (set->flags & BLK_MQ_F_TAG_SHARED)
queue_set_hctx_shared(q, true);
-   list_add_tail(>tag_set_list, >tag_list);
+   list_add_tail_rcu(>tag_set_list, >tag_list);
 
mutex_unlock(>tag_list_lock);
 }
@@ -2601,6 +2606,8 @@ void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set 
*set, int nr_hw_queues)
 {
struct request_queue *q;
 
+   lockdep_assert_held(>tag_list_lock);
+
if (nr_hw_queues > nr_cpu_ids)
nr_hw_queues = nr_cpu_ids;
if (nr_hw_queues < 1 || nr_hw_queues == set->nr_hw_queues)
-- 
2.12.0

[PATCH v4 5/6] scsi: Avoid that SCSI queues get stuck

2017-04-07 Thread Bart Van Assche

If a .queue_rq() function returns BLK_MQ_RQ_QUEUE_BUSY then the block
driver that implements that function is responsible for rerunning the
hardware queue once requests can be queued again successfully.

commit 52d7f1b5c2f3 ("blk-mq: Avoid that requeueing starts stopped
queues") removed the blk_mq_stop_hw_queue() call from scsi_queue_rq()
for the BLK_MQ_RQ_QUEUE_BUSY case. Hence change all calls to functions
that are intended to rerun a busy queue such that these examine all
hardware queues instead of only stopped queues.

Since no other functions than scsi_internal_device_block() and
scsi_internal_device_unblock() should ever stop or restart a SCSI
queue, change the blk_mq_delay_queue() call into a
blk_mq_delay_run_hw_queue() call.

Fixes: commit 52d7f1b5c2f3 ("blk-mq: Avoid that requeueing starts stopped 
queues")
Fixes: commit 7e79dadce222 ("blk-mq: stop hardware queue in 
blk_mq_delay_queue()")
Signed-off-by: Bart Van Assche 
Cc: Martin K. Petersen 
Cc: James Bottomley 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Sagi Grimberg 
Cc: Long Li 
Cc: K. Y. Srinivasan 
---
 drivers/scsi/scsi_lib.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 11972d1075f1..7bc4513bf4e4 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -496,7 +496,7 @@ static void scsi_run_queue(struct request_queue *q)
scsi_starved_list_run(sdev->host);
 
if (q->mq_ops)
-   blk_mq_start_stopped_hw_queues(q, false);
+   blk_mq_run_hw_queues(q, false);
else
blk_run_queue(q);
 }
@@ -667,7 +667,7 @@ static bool scsi_end_request(struct request *req, int error,
!list_empty(>host->starved_list))
kblockd_schedule_work(>requeue_work);
else
-   blk_mq_start_stopped_hw_queues(q, true);
+   blk_mq_run_hw_queues(q, true);
} else {
unsigned long flags;
 
@@ -1974,7 +1974,7 @@ static int scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
case BLK_MQ_RQ_QUEUE_BUSY:
if (atomic_read(>device_busy) == 0 &&
!scsi_device_blocked(sdev))
-   blk_mq_delay_queue(hctx, SCSI_QUEUE_DELAY);
+   blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
break;
case BLK_MQ_RQ_QUEUE_ERROR:
/*
-- 
2.12.0

[PATCH v4 4/6] blk-mq: Introduce blk_mq_delay_run_hw_queue()

2017-04-07 Thread Bart Van Assche

Introduce a function that runs a hardware queue unconditionally
after a delay. Note: there is already a function that stops and
restarts a hardware queue after a delay, namely blk_mq_delay_queue().

This function will be used in the next patch in this series.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Long Li 
Cc: K. Y. Srinivasan 
---
 block/blk-mq.c | 32 ++--
 include/linux/blk-mq.h |  2 ++
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index aff85d41cea3..836e3a17da54 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1146,7 +1146,8 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx 
*hctx)
return hctx->next_cpu;
 }
 
-void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
+static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async,
+   unsigned long msecs)
 {
if (unlikely(blk_mq_hctx_stopped(hctx) ||
 !blk_mq_hw_queue_mapped(hctx)))
@@ -1163,7 +1164,24 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, 
bool async)
put_cpu();
}
 
-   kblockd_schedule_work_on(blk_mq_hctx_next_cpu(hctx), >run_work);
+   if (msecs == 0)
+   kblockd_schedule_work_on(blk_mq_hctx_next_cpu(hctx),
+>run_work);
+   else
+   kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx),
+>delayed_run_work,
+msecs_to_jiffies(msecs));
+}
+
+void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs)
+{
+   __blk_mq_delay_run_hw_queue(hctx, true, msecs);
+}
+EXPORT_SYMBOL(blk_mq_delay_run_hw_queue);
+
+void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
+{
+   __blk_mq_delay_run_hw_queue(hctx, async, 0);
 }
 
 void blk_mq_run_hw_queues(struct request_queue *q, bool async)
@@ -1266,6 +1284,15 @@ static void blk_mq_run_work_fn(struct work_struct *work)
__blk_mq_run_hw_queue(hctx);
 }
 
+static void blk_mq_delayed_run_work_fn(struct work_struct *work)
+{
+   struct blk_mq_hw_ctx *hctx;
+
+   hctx = container_of(work, struct blk_mq_hw_ctx, delayed_run_work.work);
+
+   __blk_mq_run_hw_queue(hctx);
+}
+
 static void blk_mq_delay_work_fn(struct work_struct *work)
 {
struct blk_mq_hw_ctx *hctx;
@@ -1866,6 +1893,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
node = hctx->numa_node = set->numa_node;
 
INIT_WORK(>run_work, blk_mq_run_work_fn);
+   INIT_DELAYED_WORK(>delayed_run_work, blk_mq_delayed_run_work_fn);
INIT_DELAYED_WORK(>delay_work, blk_mq_delay_work_fn);
spin_lock_init(>lock);
INIT_LIST_HEAD(>dispatch);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index bdea90d75274..b90c3d5766cd 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -51,6 +51,7 @@ struct blk_mq_hw_ctx {
 
atomic_tnr_active;
 
+   struct delayed_work delayed_run_work;
struct delayed_work delay_work;
 
struct hlist_node   cpuhp_dead;
@@ -236,6 +237,7 @@ void blk_mq_stop_hw_queues(struct request_queue *q);
 void blk_mq_start_hw_queues(struct request_queue *q);
 void blk_mq_start_stopped_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async);
+void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long 
msecs);
 void blk_mq_run_hw_queues(struct request_queue *q, bool async);
 void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
-- 
2.12.0

[PATCH v4 6/6] dm rq: Avoid that request processing stalls sporadically

2017-04-07 Thread Bart Van Assche

While running the srp-test software I noticed that request
processing stalls sporadically at the beginning of a test, namely
when mkfs is run against a dm-mpath device. Every time when that
happened the following command was sufficient to resume request
processing:

echo run >/sys/kernel/debug/block/dm-0/state

This patch avoids that such request processing stalls occur. The
test I ran is as follows:

while srp-test/run_tests -d -r 30 -t 02-mq; do :; done

Signed-off-by: Bart Van Assche 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
---
 drivers/md/dm-rq.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 6886bf160fb2..d19af1d21f4c 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -755,6 +755,7 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
/* Undo dm_start_request() before requeuing */
rq_end_stats(md, rq);
rq_completed(md, rq_data_dir(rq), false);
+   blk_mq_delay_run_hw_queue(hctx, 100/*ms*/);
return BLK_MQ_RQ_QUEUE_BUSY;
}
 
-- 
2.12.0

[PATCH v4 3/6] blk-mq: Clarify comments in blk_mq_dispatch_rq_list()

2017-04-07 Thread Bart Van Assche

The blk_mq_dispatch_rq_list() implementation got modified several
times but the comments in that function were not updated every
time. Since it is nontrivial what is going on, update the comments
in blk_mq_dispatch_rq_list().

Signed-off-by: Bart Van Assche 
Cc: Omar Sandoval 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
---
 block/blk-mq.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index dba34eb79a08..aff85d41cea3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1063,8 +1063,8 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, 
struct list_head *list)
 */
if (!list_empty(list)) {
/*
-* If we got a driver tag for the next request already,
-* free it again.
+* If an I/O scheduler has been configured and we got a driver
+* tag for the next request already, free it again.
 */
rq = list_first_entry(list, struct request, queuelist);
blk_mq_put_driver_tag(rq);
@@ -1074,16 +1074,24 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx 
*hctx, struct list_head *list)
spin_unlock(>lock);
 
/*
-* the queue is expected stopped with BLK_MQ_RQ_QUEUE_BUSY, but
-* it's possible the queue is stopped and restarted again
-* before this. Queue restart will dispatch requests. And since
-* requests in rq_list aren't added into hctx->dispatch yet,
-* the requests in rq_list might get lost.
+* If SCHED_RESTART was set by the caller of this function and
+* it is no longer set that means that it was cleared by another
+* thread and hence that a queue rerun is needed.
 *
-* blk_mq_run_hw_queue() already checks the STOPPED bit
+* If TAG_WAITING is set that means that an I/O scheduler has
+* been configured and another thread is waiting for a driver
+* tag. To guarantee fairness, do not rerun this hardware queue
+* but let the other thread grab the driver tag.
 *
-* If RESTART or TAG_WAITING is set, then let completion restart
-* the queue instead of potentially looping here.
+* If no I/O scheduler has been configured it is possible that
+* the hardware queue got stopped and restarted before requests
+* were pushed back onto the dispatch list. Rerun the queue to
+* avoid starvation. Notes:
+* - blk_mq_run_hw_queue() checks whether or not a queue has
+*   been stopped before rerunning a queue.
+* - Some but not all block drivers stop a queue before
+*   returning BLK_MQ_RQ_QUEUE_BUSY. Two exceptions are scsi-mq
+*   and dm-rq.
 */
if (!blk_mq_sched_needs_restart(hctx) &&
!test_bit(BLK_MQ_S_TAG_WAITING, >state))
-- 
2.12.0

[PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically

2017-04-07 Thread Bart Van Assche

Hello Jens,

The six patches in this patch series fix the queue lockup I reported
recently on the linux-block mailing list. Please consider these patches
for inclusion in the upstream kernel.

Thanks,

Bart.

Changes between v3 and v4:
- Addressed the review comments on version three of this series about the
  patch that makes it safe to use RCU to iterate over .tag_list and also
  about the runtime performance and use of short variable names in patch 2/5.
- Clarified the description of the patch that fixes the scsi-mq stall.
- Added a patch to fix a dm-mq queue stall.
  
Changes between v2 and v3:
- Removed the blk_mq_ops.restart_hctx function pointer again.
- Modified blk_mq_sched_restart_queues() such that only a single hardware
  queue is restarted instead of multiple if hardware queues are shared.
- Introduced a new function in the block layer, namely
  blk_mq_delay_run_hw_queue().  

Changes between v1 and v2:
- Reworked scsi_restart_queues() such that it no longer takes the SCSI
  host lock.
- Added two patches - one for exporting blk_mq_sched_restart_hctx() and
  another one to make iterating with RCU over blk_mq_tag_set.tag_list safe.

Bart Van Assche (6):
  blk-mq: Make it safe to use RCU to iterate over
blk_mq_tag_set.tag_list
  blk-mq: Restart a single queue if tag sets are shared
  blk-mq: Clarify comments in blk_mq_dispatch_rq_list()
  blk-mq: Introduce blk_mq_delay_run_hw_queue()
  scsi: Avoid that SCSI queues get stuck
  dm rq: Avoid that request processing stalls sporadically

 block/blk-mq-sched.c| 63 +++---
 block/blk-mq-sched.h| 16 +--
 block/blk-mq.c  | 73 +++--
 drivers/md/dm-rq.c  |  1 +
 drivers/scsi/scsi_lib.c |  6 ++--
 include/linux/blk-mq.h  |  2 ++
 include/linux/blkdev.h  |  1 -
 7 files changed, 118 insertions(+), 44 deletions(-)

-- 
2.12.0

[PATCH V2] scsi: mpt3sas: remove redundant wmb

2017-04-07 Thread Sinan Kaya

Due to relaxed ordering requirements on multiple architectures,
drivers are required to use wmb/rmb/mb combinations when they
need to guarantee observability between the memory and the HW.

The mpt3sas driver is already using wmb() for this purpose.
However, it issues a writel following wmb(). writel() function
on arm/arm64 arhictectures have an embedded wmb() call inside.

This results in unnecessary performance loss and code duplication.

writel already guarantees ordering for both cpu and bus. we don't need
additional wmb()

Signed-off-by: Sinan Kaya 
---
 drivers/scsi/mpt3sas/mpt3sas_base.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c 
b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 5b7aec5..18039bb 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -1025,7 +1025,6 @@ static int mpt3sas_remove_dead_ioc_func(void *arg)
0 : ioc->reply_free_host_index + 1;
ioc->reply_free[ioc->reply_free_host_index] =
cpu_to_le32(reply);
-   wmb();
writel(ioc->reply_free_host_index,
>chip->ReplyFreeHostIndex);
}
@@ -1074,7 +1073,6 @@ static int mpt3sas_remove_dead_ioc_func(void *arg)
return IRQ_NONE;
}
 
-   wmb();
if (ioc->is_warpdrive) {
writel(reply_q->reply_post_host_index,
ioc->reply_post_host_index[msix_index]);
-- 
1.9.1

Re: [PATCH] scsi: mpt3sas: remove redundant wmb on arm/arm64

2017-04-07 Thread James Bottomley

On Fri, 2017-04-07 at 12:41 -0400, Sinan Kaya wrote:
> Due to relaxed ordering requirements on multiple architectures,
> drivers are required to use wmb/rmb/mb combinations when they
> need to guarantee observability between the memory and the HW.
> 
> The mpt3sas driver is already using wmb() for this purpose.
> However, it issues a writel following wmb(). writel() function
> on arm/arm64 arhictectures have an embedded wmb() call inside.
> 
> This results in unnecessary performance loss and code duplication.
> 
> The kernel has been updated to support relaxed read/write
> API to be supported across all architectures now.
> 
> The right thing was to either call __raw_writel/__raw_readl or
> write_relaxed/read_relaxed for multi-arch compatibility.

writeX_relaxed and thus your patch is definitely wrong.  The reason is
that we have two ordering domains: the CPU and the Bus.  wmb forces
ordering in the CPU domain but not the bus domain.  writeX originally
forced ordering in the bus domain but not the CPU domain, but since the
raw primitives I think it now orders in both and writeX_relaxed orders
in neither domain, so your patch would currently eliminate the bus
ordering.

James

Re: [PATCH 2/2] scsi: sd: Remove LBPRZ dependency for discards

2017-04-07 Thread Bart Van Assche

On Wed, 2017-04-05 at 07:41 -0400, Martin K. Petersen wrote:
> Separating discards and zeroout operations allows us to remove the LBPRZ
> block zeroing constraints from discards and honor the device preferences
> for UNMAP commands.
> 
> If supported by the device, we'll also choose UNMAP over one of the
> WRITE SAME variants for discards.

Reviewed-by: Bart Van Assche

Re: [RFC 5/8] scatterlist: Modify SG copy functions to support io memory.

2017-04-07 Thread Logan Gunthorpe

Hi Dan,

On 03/04/17 06:07 PM, Dan Williams wrote:
> The completely agnostic part is where I get worried, but I shouldn't
> say anymore until I actually read the patch.The worry is cases where
> this agnostic enabling allows unsuspecting code paths to do the wrong
> thing. Like bypass iomem safety.

Yup, you're right the iomem safety issue is a really difficult problem.
I think replacing struct page with pfn_t in a bunch of places is
probably going to be a requirement for my work. However, this is going
to be a very large undertaking.

I've done an audit of sg_page users and there will indeed be some
difficult cases. However, I'm going to start doing some cleanup and
semantic changes to hopefully move in that direction. The first step
I've chosen to look at is to create an sg_kmap interface which replaces
about 77 (out of ~340) sg_page users. I'm hoping the new interface can
have the semantic that sg_kmap can fail (which would happen in the case
that no suitable page exists).

Eventually, I'd want to get to a place where sg_page either doesn't
exists or can fail and is always checked. At that point swapping out
pfn_t in the sgl would be manageable.

Thoughts?

Logan

Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically

2017-04-07 Thread Jens Axboe

On 04/07/2017 12:16 PM, Bart Van Assche wrote:
> Hello Jens,
> 
> The six patches in this patch series fix the queue lockup I reported
> recently on the linux-block mailing list. Please consider these patches
> for inclusion in the upstream kernel.

Some of this we need in 4.11, but not all of it. I can't be applying patches
that "improve scalability" at this point.

4-6 looks like what we want for 4.11, I'll see if those apply directly. Then
we can put 1-3 on top in 4.12, with the others pulled in first.

-- 
Jens Axboe

Re: [PATCH 1/2] scsi: sd: Separate zeroout and discard command choices

2017-04-07 Thread Bart Van Assche

On Wed, 2017-04-05 at 07:41 -0400, Martin K. Petersen wrote:
> +static const char *zeroing_mode[] = {
> + [SD_ZERO_WRITE] = "write",
> + [SD_ZERO_WS]= "writesame",
> + [SD_ZERO_WS16_UNMAP]= "writesame_16_unmap",
> + [SD_ZERO_WS10_UNMAP]= "writesame_10_unmap",
> +};
> +
> +static ssize_t
> +zeroing_mode_show(struct device *dev, struct device_attribute *attr,
> +   char *buf)
> +{
> + struct scsi_disk *sdkp = to_scsi_disk(dev);
> +
> + return snprintf(buf, 20, "%s\n", zeroing_mode[sdkp->zeroing_mode]);
> +}

Hello Martin,

If anyone would ever add a string to zeroing_mode[] that is longer than 20
characters then zeroing_mode_show() will truncate it. Since all strings in
the zeroing_mode[] array are short, have you considered to use sprintf()
instead? And if you do not want to use sprintf(), how about using
snprintf(buf, PAGE_SIZE, ...)? I'm asking this because I'm no fan of magic
constants.

> +static ssize_t
> +zeroing_mode_store(struct device *dev, struct device_attribute *attr,
> +const char *buf, size_t count)
> +{
> + struct scsi_disk *sdkp = to_scsi_disk(dev);
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EACCES;
> +
> + if (!strncmp(buf, zeroing_mode[SD_ZERO_WRITE], 20))
> + sdkp->zeroing_mode = SD_ZERO_WRITE;
> + else if (!strncmp(buf, zeroing_mode[SD_ZERO_WS], 20))
> + sdkp->zeroing_mode = SD_ZERO_WS;
> + else if (!strncmp(buf, zeroing_mode[SD_ZERO_WS16_UNMAP], 20))
> + sdkp->zeroing_mode = SD_ZERO_WS16_UNMAP;
> + else if (!strncmp(buf, zeroing_mode[SD_ZERO_WS10_UNMAP], 20))
> + sdkp->zeroing_mode = SD_ZERO_WS10_UNMAP;
> + else
> + return -EINVAL;
> +
> + return count;
> +}

Since sysfs guarantees that buf is '\0'-terminated, why does the above
function call strncmp() instead of strcmp()?

Can the above chain of if-statements be replaced by a for-loop such that
zeroing_mode_store() won't have to be updated if the zeroing_mode[] array
is modified?

Thanks,

Bart.

Re: [PATCH] ibmvscsis: Do not send aborted task response

2017-04-07 Thread Bryant G. Ly




On 4/7/17 10:49 AM, Bryant G. Ly wrote:

The driver is sending a response to the aborted task response
along with LIO sending the tmr response. SCSI spec says
that the initiator that sends the abort task TM NEVER gets a
response to the aborted op and with the current code it will
send a response. Thus this fix will remove that response if the
op is aborted.

Cc:  # v4.8+
Signed-off-by: Bryant G. Ly 
---
  drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 60 +---
  drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.h |  1 +
  2 files changed, 40 insertions(+), 21 deletions(-)

diff --git a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c 
b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
index 4bb5635..8e2733f 100644
--- a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
+++ b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
@@ -1169,6 +1169,7 @@ static struct ibmvscsis_cmd 
*ibmvscsis_get_free_cmd(struct scsi_info *vscsi)
cmd = list_first_entry_or_null(>free_cmd,
   struct ibmvscsis_cmd, list);
if (cmd) {
+   cmd->flags &= ~(CMD_ABORTED);
list_del(>list);
cmd->iue = iue;
cmd->type = UNSET_TYPE;
@@ -1758,33 +1759,41 @@ static void ibmvscsis_send_messages(struct scsi_info 
*vscsi)

if (!(vscsi->flags & RESPONSE_Q_DOWN)) {
list_for_each_entry_safe(cmd, nxt, >waiting_rsp, list) {
-   iue = cmd->iue;
+   /*
+* If an Abort flag is set then dont send response
+*/
+   if (cmd->flags & CMD_ABORTED) {
+   list_del(>list);
+   ibmvscsis_free_cmd_resources(vscsi, cmd);
+   } else {
+   iue = cmd->iue;

-   crq->valid = VALID_CMD_RESP_EL;
-   crq->format = cmd->rsp.format;
+   crq->valid = VALID_CMD_RESP_EL;
+   crq->format = cmd->rsp.format;

-   if (cmd->flags & CMD_FAST_FAIL)
-   crq->status = VIOSRP_ADAPTER_FAIL;
+   if (cmd->flags & CMD_FAST_FAIL)
+   crq->status = VIOSRP_ADAPTER_FAIL;

-   crq->IU_length = cpu_to_be16(cmd->rsp.len);
+   crq->IU_length = cpu_to_be16(cmd->rsp.len);

-   rc = h_send_crq(vscsi->dma_dev->unit_address,
-   be64_to_cpu(msg_hi),
-   be64_to_cpu(cmd->rsp.tag));
+   rc = h_send_crq(vscsi->dma_dev->unit_address,
+   be64_to_cpu(msg_hi),
+   be64_to_cpu(cmd->rsp.tag));

-   pr_debug("send_messages: cmd %p, tag 0x%llx, rc %ld\n",
-cmd, be64_to_cpu(cmd->rsp.tag), rc);
+   pr_debug("send_messages: cmd %p, tag 0x%llx, rc 
%ld\n",
+cmd, be64_to_cpu(cmd->rsp.tag), rc);

-   /* if all ok free up the command element resources */
-   if (rc == H_SUCCESS) {
-   /* some movement has occurred */
-   vscsi->rsp_q_timer.timer_pops = 0;
-   list_del(>list);
+   /* if all ok free up the command element 
resources */
+   if (rc == H_SUCCESS) {
+   /* some movement has occurred */
+   vscsi->rsp_q_timer.timer_pops = 0;
+   list_del(>list);

-   ibmvscsis_free_cmd_resources(vscsi, cmd);
-   } else {
-   srp_snd_msg_failed(vscsi, rc);
-   break;
+   ibmvscsis_free_cmd_resources(vscsi, 
cmd);
+   } else {
+   srp_snd_msg_failed(vscsi, rc);
+   break;
+   }
}
}

@@ -3581,9 +3590,15 @@ static int ibmvscsis_write_pending(struct se_cmd *se_cmd)
  {
struct ibmvscsis_cmd *cmd = container_of(se_cmd, struct ibmvscsis_cmd,
 se_cmd);
+   struct scsi_info *vscsi = cmd->adapter;
struct iu_entry *iue = cmd->iue;
int rc;

+   if ((vscsi->flags & (CLIENT_FAILED | RESPONSE_Q_DOWN))) {
+   pr_err("write_pending failed since: %d\n", vscsi->flags);
+

Re: [PATCH] ibmvscsis: Do not send aborted task response

2017-04-07 Thread Bryant G. Ly




On 4/7/17 11:36 AM, Bart Van Assche wrote:

On Fri, 2017-04-07 at 11:12 -0500, Bryant G. Ly wrote:

So from this stack trace it looks like the ibmvscsis driver is sending an
extra response through send_messages even though an abort was issued.
IBMi, reported this issue internally when they were testing the driver,
because their client didn't like getting a response back for the aborted op.
They are only expecting a TMR from the abort request, NOT the aborted op.

Hello Bryant,

What is the root cause of this behavior? Why is it that the behavior of
the ibmvscsi_tgt contradicts what has been implemented in the LIO core?
Sorry but the patch at the start of this thread looks to me like an
attempt to paper over the problem instead of addressing the root cause.

Thanks,


IBMi clients received a response for an aborted operation, so they sent an 
abort tm request.
Afterwards they get a CRQ response to the op that they aborted. That should not 
happen, because they are only supposed to get a response for the tm request NOT 
the aborted operation.
Looking at the code it looks like it is due to send messages, processing a 
response without checking to see if it was an aborted op.
This patch addresses a bug within the ibmvscsis driver and the fact that it 
SENT a response to the aborted operation(which is wrong since) without looking 
at what LIO core had done.
The driver isn't supposed to send any response to the aborted operation, BUT 
only a response to the abort tm request, which LIO core currently does.

-Bryant

Re: [PATCH] ibmvscsis: Do not send aborted task response

2017-04-07 Thread Bart Van Assche

On 04/07/2017 08:49 AM, Bryant G. Ly wrote:
> The driver is sending a response to the aborted task response

That occurrence of "response" looks extraneous?

> along with LIO sending the tmr response. SCSI spec says
> that the initiator that sends the abort task TM NEVER gets a
> response to the aborted op and with the current code it will
> send a response. Thus this fix will remove that response if the
> op is aborted.

Hello Bryan,

Are you sure that a new flag needed to prevent that a command response
is sent to the initiator that submitted the ABORT?
__target_check_io_state() only sets CMD_T_TAS if the TMF was received
from another I_T nexus than the I_T nexus through which the aborted
command was received. core_tmr_handle_tas_abort() only triggers a call
to .queue_status() if CMD_T_TAS is set. Do you agree with this analysis?

Thanks,

Bart.

Re: linux-next: manual merge of the scsi-mkp tree with the char-misc tree

2017-04-07 Thread Bart Van Assche

On Fri, 2017-04-07 at 13:29 -0600, Logan Gunthorpe wrote:
> On 07/04/17 09:49 AM, Bart Van Assche wrote:
> > Sorry that I had not yet noticed Logan's patch series. Should my two
> > patches that conflict with Logan's patch series be dropped and reworked
> > after Logan's patches are upstream?
> 
> Yeah, Greg took my patchset around a few maintainers relatively quickly.
> This is the second conflict, so sorry about that. Looks like the easiest
> thing would be to just base your change off of mine. It doesn't look too
> difficult. If you can do it before my patch hits upstream, I'd
> appreciate some testing and/or review as no one from the scsi side
> responded and that particular patch was a bit more involved than I would
> have liked.

Boaz, had you noticed Logan's osd patch? If not, can you have a look?

Thanks,

Bart.

Re: [PATCH 1/2] scsi: sd: Separate zeroout and discard command choices

2017-04-07 Thread Bart Van Assche

On Wed, 2017-04-05 at 07:41 -0400, Martin K. Petersen wrote:
> +static ssize_t
> +zeroing_mode_store(struct device *dev, struct device_attribute *attr,
> +const char *buf, size_t count)
> +{
> + struct scsi_disk *sdkp = to_scsi_disk(dev);
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EACCES;
> +
> + if (!strncmp(buf, zeroing_mode[SD_ZERO_WRITE], 20))
> + sdkp->zeroing_mode = SD_ZERO_WRITE;
> + else if (!strncmp(buf, zeroing_mode[SD_ZERO_WS], 20))
> + sdkp->zeroing_mode = SD_ZERO_WS;
> + else if (!strncmp(buf, zeroing_mode[SD_ZERO_WS16_UNMAP], 20))
> + sdkp->zeroing_mode = SD_ZERO_WS16_UNMAP;
> + else if (!strncmp(buf, zeroing_mode[SD_ZERO_WS10_UNMAP], 20))
> + sdkp->zeroing_mode = SD_ZERO_WS10_UNMAP;
> + else
> + return -EINVAL;
> +
> + return count;
> +}

An additional question about this function: if the shell command "echo" is used
without command-line option -n to modify the "zeroing_mode" sysfs attribute then
a newline character will be present in buf. Does the above code handle newline
characters correctly?

Bart.

[PATCH 12/12] hpsa: bump driver version

2017-04-07 Thread Don Brace

Reviewed-by: Scott Benesh 
Reviewed-by: Scott Teel 
Reviewed-by: Kevin Barnett 
Signed-off-by: Don Brace 
---
 drivers/scsi/hpsa.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 33db581..42047f0 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -60,7 +60,7 @@
  * HPSA_DRIVER_VERSION must be 3 byte values (0-255) separated by '.'
  * with an optional trailing '-' followed by a byte value (0-255).
  */
-#define HPSA_DRIVER_VERSION "3.4.18-0"
+#define HPSA_DRIVER_VERSION "3.4.20-0"
 #define DRIVER_NAME "HP HPSA Driver (v " HPSA_DRIVER_VERSION ")"
 #define HPSA "hpsa"

[PATCH 11/12] hpsa: remove abort handler

2017-04-07 Thread Don Brace

 - simplify the driver
 - there are a lot of quirky racy conditions not handled
 - causes more aborts/resets when the number of commands to
   be aborted is large, such as in multi-path fail-overs.
 - has been turned off in our internal driver since 8/31/2015

Reviewed-by: Scott Benesh 
Reviewed-by: Scott Teel 
Reviewed-by: Kevin Barnett 
Signed-off-by: Don Brace 
---
 drivers/scsi/hpsa.c |  621 +--
 drivers/scsi/hpsa.h |1 
 2 files changed, 8 insertions(+), 614 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 68d020a..33db581 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -258,7 +258,6 @@ static int hpsa_scan_finished(struct Scsi_Host *sh,
 static int hpsa_change_queue_depth(struct scsi_device *sdev, int qdepth);
 
 static int hpsa_eh_device_reset_handler(struct scsi_cmnd *scsicmd);
-static int hpsa_eh_abort_handler(struct scsi_cmnd *scsicmd);
 static int hpsa_slave_alloc(struct scsi_device *sdev);
 static int hpsa_slave_configure(struct scsi_device *sdev);
 static void hpsa_slave_destroy(struct scsi_device *sdev);
@@ -326,7 +325,7 @@ static inline bool hpsa_is_cmd_idle(struct CommandList *c)
 
 static inline bool hpsa_is_pending_event(struct CommandList *c)
 {
-   return c->abort_pending || c->reset_pending;
+   return c->reset_pending;
 }
 
 /* extract sense key, asc, and ascq from sense data.  -1 means invalid. */
@@ -581,12 +580,6 @@ static u32 soft_unresettable_controller[] = {
0x409D0E11, /* Smart Array 6400 EM */
 };
 
-static u32 needs_abort_tags_swizzled[] = {
-   0x323D103C, /* Smart Array P700m */
-   0x324a103C, /* Smart Array P712m */
-   0x324b103C, /* SmartArray P711m */
-};
-
 static int board_id_in_array(u32 a[], int nelems, u32 board_id)
 {
int i;
@@ -615,12 +608,6 @@ static int ctlr_is_resettable(u32 board_id)
ctlr_is_soft_resettable(board_id);
 }
 
-static int ctlr_needs_abort_tags_swizzled(u32 board_id)
-{
-   return board_id_in_array(needs_abort_tags_swizzled,
-   ARRAY_SIZE(needs_abort_tags_swizzled), board_id);
-}
-
 static ssize_t host_show_resettable(struct device *dev,
struct device_attribute *attr, char *buf)
 {
@@ -928,8 +915,8 @@ static struct device_attribute *hpsa_shost_attrs[] = {
NULL,
 };
 
-#define HPSA_NRESERVED_CMDS(HPSA_CMDS_RESERVED_FOR_ABORTS + \
-   HPSA_CMDS_RESERVED_FOR_DRIVER + HPSA_MAX_CONCURRENT_PASSTHRUS)
+#define HPSA_NRESERVED_CMDS(HPSA_CMDS_RESERVED_FOR_DRIVER +\
+HPSA_MAX_CONCURRENT_PASSTHRUS)
 
 static struct scsi_host_template hpsa_driver_template = {
.module = THIS_MODULE,
@@ -941,7 +928,6 @@ static struct scsi_host_template hpsa_driver_template = {
.change_queue_depth = hpsa_change_queue_depth,
.this_id= -1,
.use_clustering = ENABLE_CLUSTERING,
-   .eh_abort_handler   = hpsa_eh_abort_handler,
.eh_device_reset_handler = hpsa_eh_device_reset_handler,
.ioctl  = hpsa_ioctl,
.slave_alloc= hpsa_slave_alloc,
@@ -2358,26 +2344,12 @@ static void hpsa_cmd_resolve_events(struct ctlr_info *h,
bool do_wake = false;
 
/*
-* Prevent the following race in the abort handler:
-*
-* 1. LLD is requested to abort a SCSI command
-* 2. The SCSI command completes
-* 3. The struct CommandList associated with step 2 is made available
-* 4. New I/O request to LLD to another LUN re-uses struct CommandList
-* 5. Abort handler follows scsi_cmnd->host_scribble and
-*finds struct CommandList and tries to aborts it
-* Now we have aborted the wrong command.
-*
-* Reset c->scsi_cmd here so that the abort or reset handler will know
+* Reset c->scsi_cmd here so that the reset handler will know
 * this command has completed.  Then, check to see if the handler is
 * waiting for this command, and, if so, wake it.
 */
c->scsi_cmd = SCSI_CMD_IDLE;
mb();   /* Declare command idle before checking for pending events. */
-   if (c->abort_pending) {
-   do_wake = true;
-   c->abort_pending = false;
-   }
if (c->reset_pending) {
unsigned long flags;
struct hpsa_scsi_dev_t *dev;
@@ -2420,20 +2392,6 @@ static void hpsa_retry_cmd(struct ctlr_info *h, struct 
CommandList *c)
queue_work_on(raw_smp_processor_id(), h->resubmit_wq, >work);
 }
 
-static void hpsa_set_scsi_cmd_aborted(struct scsi_cmnd *cmd)
-{
-   cmd->result = DID_ABORT << 16;
-}
-
-static void hpsa_cmd_abort_and_free(struct ctlr_info *h, struct CommandList *c,
-   struct scsi_cmnd *cmd)
-{
-

[PATCH 08/12] hpsa: correct queue depth for externals

2017-04-07 Thread Don Brace

 - queue depth assignment not in correct place, had no effect.

Reviewed-by: Scott Benesh 
Reviewed-by: Scott Teel 
Reviewed-by: Kevin Barnett 
Signed-off-by: Don Brace 
---
 drivers/scsi/hpsa.c |   22 ++
 drivers/scsi/hpsa.h |1 +
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index a6a37e0..40a87f9 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -2066,10 +2066,13 @@ static int hpsa_slave_configure(struct scsi_device 
*sdev)
sd = sdev->hostdata;
sdev->no_uld_attach = !sd || !sd->expose_device;
 
-   if (sd)
-   queue_depth = sd->queue_depth != 0 ?
-   sd->queue_depth : sdev->host->can_queue;
-   else
+   if (sd) {
+   if (sd->external)
+   queue_depth = EXTERNAL_QD;
+   else
+   queue_depth = sd->queue_depth != 0 ?
+   sd->queue_depth : sdev->host->can_queue;
+   } else
queue_depth = sdev->host->can_queue;
 
scsi_change_queue_depth(sdev, queue_depth);
@@ -3912,6 +3915,9 @@ static int hpsa_update_device_info(struct ctlr_info *h,
this_device->queue_depth = h->nr_cmds;
}
 
+   if (this_device->external)
+   this_device->queue_depth = EXTERNAL_QD;
+
if (is_OBDR_device) {
/* See if this is a One-Button-Disaster-Recovery device
 * by looking for "$DR-10" at offset 43 in inquiry data.
@@ -4120,14 +4126,6 @@ static void hpsa_get_ioaccel_drive_info(struct ctlr_info 
*h,
int rc;
struct ext_report_lun_entry *rle;
 
-   /*
-* external targets don't support BMIC
-*/
-   if (dev->external) {
-   dev->queue_depth = 7;
-   return;
-   }
-
rle = >LUN[rle_index];
 
dev->ioaccel_handle = rle->ioaccel_handle;
diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index 6f04f2a..99539c0 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -57,6 +57,7 @@ struct hpsa_sas_phy {
bool added_to_port;
 };
 
+#define EXTERNAL_QD 7
 struct hpsa_scsi_dev_t {
unsigned int devtype;
int bus, target, lun;   /* as presented to the OS */

[PATCH 09/12] hpsa: separate monitor events from heartbeat worker

2017-04-07 Thread Don Brace

From: Scott Teel 

create new worker thread to monitor controller events
 - detect controller events more frequently.
 - leave heartbeat check at 30 seconds.

Reviewed-by: Scott Benesh 
Reviewed-by: Scott Teel 
Reviewed-by: Kevin Barnett 
Signed-off-by: Don Brace 
---
 drivers/scsi/hpsa.c |   32 ++--
 drivers/scsi/hpsa.h |1 +
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 40a87f9..50f7c09 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -1110,6 +1110,7 @@ static int is_firmware_flash_cmd(u8 *cdb)
  */
 #define HEARTBEAT_SAMPLE_INTERVAL_DURING_FLASH (240 * HZ)
 #define HEARTBEAT_SAMPLE_INTERVAL (30 * HZ)
+#define HPSA_EVENT_MONITOR_INTERVAL (15 * HZ)
 static void dial_down_lockup_detection_during_fw_flash(struct ctlr_info *h,
struct CommandList *c)
 {
@@ -8650,6 +8651,29 @@ static int hpsa_luns_changed(struct ctlr_info *h)
return rc;
 }
 
+/*
+ * watch for controller events
+ */
+static void hpsa_event_monitor_worker(struct work_struct *work)
+{
+   struct ctlr_info *h = container_of(to_delayed_work(work),
+   struct ctlr_info, event_monitor_work);
+
+   if (h->remove_in_progress)
+   return;
+
+   if (hpsa_ctlr_needs_rescan(h)) {
+   scsi_host_get(h->scsi_host);
+   hpsa_ack_ctlr_events(h);
+   hpsa_scan_start(h->scsi_host);
+   scsi_host_put(h->scsi_host);
+   }
+
+   if (!h->remove_in_progress)
+   schedule_delayed_work(>event_monitor_work,
+   HPSA_EVENT_MONITOR_INTERVAL);
+}
+
 static void hpsa_rescan_ctlr_worker(struct work_struct *work)
 {
unsigned long flags;
@@ -8668,9 +8692,9 @@ static void hpsa_rescan_ctlr_worker(struct work_struct 
*work)
return;
}
 
-   if (hpsa_ctlr_needs_rescan(h) || hpsa_offline_devices_ready(h)) {
+   if (h->drv_req_rescan || hpsa_offline_devices_ready(h)) {
+   h->drv_req_rescan = 0;
scsi_host_get(h->scsi_host);
-   hpsa_ack_ctlr_events(h);
hpsa_scan_start(h->scsi_host);
scsi_host_put(h->scsi_host);
} else if (h->discovery_polling) {
@@ -8949,6 +8973,9 @@ static int hpsa_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
INIT_DELAYED_WORK(>rescan_ctlr_work, hpsa_rescan_ctlr_worker);
queue_delayed_work(h->rescan_ctlr_wq, >rescan_ctlr_work,
h->heartbeat_sample_interval);
+   INIT_DELAYED_WORK(>event_monitor_work, hpsa_event_monitor_worker);
+   schedule_delayed_work(>event_monitor_work,
+   HPSA_EVENT_MONITOR_INTERVAL);
return 0;
 
 clean7: /* perf, sg, cmd, irq, shost, pci, lu, aer/h */
@@ -9117,6 +9144,7 @@ static void hpsa_remove_one(struct pci_dev *pdev)
spin_unlock_irqrestore(>lock, flags);
cancel_delayed_work_sync(>monitor_ctlr_work);
cancel_delayed_work_sync(>rescan_ctlr_work);
+   cancel_delayed_work_sync(>event_monitor_work);
destroy_workqueue(h->rescan_ctlr_wq);
destroy_workqueue(h->resubmit_wq);
 
diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index 99539c0..3c22ac1 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -245,6 +245,7 @@ struct ctlr_info {
u32 __percpu *lockup_detected;
struct delayed_work monitor_ctlr_work;
struct delayed_work rescan_ctlr_work;
+   struct delayed_work event_monitor_work;
int remove_in_progress;
/* Address of h->q[x] is passed to intr handler to know which queue */
u8 q[MAX_REPLY_QUEUES];

[PATCH 02/12] hpsa: do not get enclosure info for external devices

2017-04-07 Thread Don Brace

external shelves do not support BMICs.

Reviewed-by: Scott Benesh 
Reviewed-by: Scott Teel 
Reviewed-by: Kevin Barnett 
Signed-off-by: Don Brace 
---
 drivers/scsi/hpsa.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 73daace..8e22aed 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -3353,6 +3353,11 @@ static void hpsa_get_enclosure_info(struct ctlr_info *h,
 
bmic_device_index = GET_BMIC_DRIVE_NUMBER(>lunid[0]);
 
+   if (encl_dev->target == -1 || encl_dev->lun == -1) {
+   rc = IO_OK;
+   goto out;
+   }
+
if (bmic_device_index == 0xFF00 || MASKED_DEVICE(>lunid[0])) {
rc = IO_OK;
goto out;

[PATCH 01/12] hpsa: update identify physical device structure

2017-04-07 Thread Don Brace

 - align with latest spec.
 - added __attribute((aligned(512)))

Reviewed-by: Scott Teel 
Reviewed-by: Scott Benesh 
Reviewed-by: Kevin Barnett 
Signed-off-by: Don Brace 
---
 drivers/scsi/hpsa_cmd.h |   20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/hpsa_cmd.h b/drivers/scsi/hpsa_cmd.h
index 5961705..078afe4 100644
--- a/drivers/scsi/hpsa_cmd.h
+++ b/drivers/scsi/hpsa_cmd.h
@@ -809,10 +809,7 @@ struct bmic_identify_physical_device {
u8 max_temperature_degreesC;
u8 logical_blocks_per_phys_block_exp; /* phyblocksize = 512*2^exp */
__le16 current_queue_depth_limit;
-   u8 switch_name[10];
-   __le16 switch_port;
-   u8 alternate_paths_switch_name[40];
-   u8 alternate_paths_switch_port[8];
+   u8 reserved_switch_stuff[60];
__le16 power_on_hours; /* valid only if gas gauge supported */
__le16 percent_endurance_used; /* valid only if gas gauge supported. */
 #define BMIC_PHYS_DRIVE_SSD_WEAROUT(idphydrv) \
@@ -828,11 +825,22 @@ struct bmic_identify_physical_device {
(idphydrv->smart_carrier_authentication == 0x01)
u8 smart_carrier_app_fw_version;
u8 smart_carrier_bootloader_fw_version;
+   u8 sanitize_support_flags;
+   u8 drive_key_flags;
u8 encryption_key_name[64];
__le32 misc_drive_flags;
__le16 dek_index;
-   u8 padding[112];
-};
+   __le16 hba_drive_encryption_flags;
+   __le16 max_overwrite_time;
+   __le16 max_block_erase_time;
+   __le16 max_crypto_erase_time;
+   u8 device_connector_info[5];
+   u8 connector_name[8][8];
+   u8 page_83_id[16];
+   u8 max_link_rate[256];
+   u8 neg_phys_link_rate[256];
+   u8 box_conn_name[8];
+} __attribute((aligned(512)));
 
 struct bmic_sense_subsystem_info {
u8  primary_slot_number;

[PATCH 06/12] hpsa: correct resets on retried commands

2017-04-07 Thread Don Brace

 - call scsi_done when the command completes.

Reviewed-by: Scott Benesh 
Reviewed-by: Scott Teel 
Reviewed-by: Kevin Barnett 
Signed-off-by: Don Brace 
---
 drivers/scsi/hpsa.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 53a4f34..a2852da 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -5465,7 +5465,7 @@ static void hpsa_command_resubmit_worker(struct 
work_struct *work)
return hpsa_cmd_free_and_done(c->h, c, cmd);
}
if (c->reset_pending)
-   return hpsa_cmd_resolve_and_free(c->h, c);
+   return hpsa_cmd_free_and_done(c->h, c, cmd);
if (c->abort_pending)
return hpsa_cmd_abort_and_free(c->h, c, cmd);
if (c->cmd_type == CMD_IOACCEL2) {

[PATCH 10/12] hpsa: send ioaccel requests with 0 length down raid path

2017-04-07 Thread Don Brace

 - Block I/O requests with 0 length transfers which go down
   the ioaccel path. This causes lockup issues down in the basecode.
   - These issues have been fixed, but there are customers who are
 experiencing the issues when running older firmware.

Reviewed-by: Scott Benesh 
Reviewed-by: Scott Teel 
Reviewed-by: Kevin Barnett 
Signed-off-by: Don Brace 
---
 drivers/scsi/hpsa.c |   62 ++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 50f7c09..68d020a 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -4588,7 +4588,55 @@ static int hpsa_scatter_gather(struct ctlr_info *h,
return 0;
 }
 
-#define IO_ACCEL_INELIGIBLE (1)
+#define BUFLEN 128
+static inline void warn_zero_length_transfer(struct ctlr_info *h,
+   u8 *cdb, int cdb_len,
+   const char *func)
+{
+   char buf[BUFLEN];
+   int outlen;
+   int i;
+
+   outlen = scnprintf(buf, BUFLEN,
+   "%s: Blocking zero-length request: CDB:", func);
+   for (i = 0; i < cdb_len; i++)
+   outlen += scnprintf(buf+outlen, BUFLEN - outlen,
+   "%02hhx", cdb[i]);
+   dev_warn(>pdev->dev, "%s\n", buf);
+}
+
+#define IO_ACCEL_INELIGIBLE 1
+/* zero-length transfers trigger hardware errors. */
+static bool is_zero_length_transfer(u8 *cdb)
+{
+   u32 block_cnt;
+
+   /* Block zero-length transfer sizes on certain commands. */
+   switch (cdb[0]) {
+   case READ_10:
+   case WRITE_10:
+   case VERIFY:/* 0x2F */
+   case WRITE_VERIFY:  /* 0x2E */
+   block_cnt = get_unaligned_be16([7]);
+   break;
+   case READ_12:
+   case WRITE_12:
+   case VERIFY_12: /* 0xAF */
+   case WRITE_VERIFY_12:   /* 0xAE */
+   block_cnt = get_unaligned_be32([6]);
+   break;
+   case READ_16:
+   case WRITE_16:
+   case VERIFY_16: /* 0x8F */
+   block_cnt = get_unaligned_be32([10]);
+   break;
+   default:
+   return false;
+   }
+
+   return block_cnt == 0;
+}
+
 static int fixup_ioaccel_cdb(u8 *cdb, int *cdb_len)
 {
int is_write = 0;
@@ -4655,6 +4703,12 @@ static int hpsa_scsi_ioaccel1_queue_command(struct 
ctlr_info *h,
 
BUG_ON(cmd->cmd_len > IOACCEL1_IOFLAGS_CDBLEN_MAX);
 
+   if (is_zero_length_transfer(cdb)) {
+   warn_zero_length_transfer(h, cdb, cdb_len, __func__);
+   atomic_dec(_disk->ioaccel_cmds_out);
+   return IO_ACCEL_INELIGIBLE;
+   }
+
if (fixup_ioaccel_cdb(cdb, _len)) {
atomic_dec(_disk->ioaccel_cmds_out);
return IO_ACCEL_INELIGIBLE;
@@ -4819,6 +4873,12 @@ static int hpsa_scsi_ioaccel2_queue_command(struct 
ctlr_info *h,
 
BUG_ON(scsi_sg_count(cmd) > h->maxsgentries);
 
+   if (is_zero_length_transfer(cdb)) {
+   warn_zero_length_transfer(h, cdb, cdb_len, __func__);
+   atomic_dec(_disk->ioaccel_cmds_out);
+   return IO_ACCEL_INELIGIBLE;
+   }
+
if (fixup_ioaccel_cdb(cdb, _len)) {
atomic_dec(_disk->ioaccel_cmds_out);
return IO_ACCEL_INELIGIBLE;

[PATCH 07/12] hpsa: cleanup reset handler

2017-04-07 Thread Don Brace

 - mark device state sooner.

Reviewed-by: Scott Benesh 
Reviewed-by: Scott Teel 
Reviewed-by: Kevin Barnett 
Signed-off-by: Don Brace 
---
 drivers/scsi/hpsa.c |   44 ++--
 1 file changed, 30 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index a2852da..a6a37e0 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -5834,7 +5834,7 @@ static int wait_for_device_to_become_ready(struct 
ctlr_info *h,
  */
 static int hpsa_eh_device_reset_handler(struct scsi_cmnd *scsicmd)
 {
-   int rc;
+   int rc = SUCCESS;
struct ctlr_info *h;
struct hpsa_scsi_dev_t *dev;
u8 reset_type;
@@ -5845,17 +5845,24 @@ static int hpsa_eh_device_reset_handler(struct 
scsi_cmnd *scsicmd)
if (h == NULL) /* paranoia */
return FAILED;
 
-   if (lockup_detected(h))
-   return FAILED;
+   h->reset_in_progress = 1;
+
+   if (lockup_detected(h)) {
+   rc = FAILED;
+   goto return_reset_status;
+   }
 
dev = scsicmd->device->hostdata;
if (!dev) {
dev_err(>pdev->dev, "%s: device lookup failed\n", __func__);
-   return FAILED;
+   rc = FAILED;
+   goto return_reset_status;
}
 
-   if (dev->devtype == TYPE_ENCLOSURE)
-   return SUCCESS;
+   if (dev->devtype == TYPE_ENCLOSURE) {
+   rc = SUCCESS;
+   goto return_reset_status;
+   }
 
/* if controller locked up, we can guarantee command won't complete */
if (lockup_detected(h)) {
@@ -5863,7 +5870,8 @@ static int hpsa_eh_device_reset_handler(struct scsi_cmnd 
*scsicmd)
 "cmd %d RESET FAILED, lockup detected",
 hpsa_get_cmd_index(scsicmd));
hpsa_show_dev_msg(KERN_WARNING, h, dev, msg);
-   return FAILED;
+   rc = FAILED;
+   goto return_reset_status;
}
 
/* this reset request might be the result of a lockup; check */
@@ -5872,12 +5880,15 @@ static int hpsa_eh_device_reset_handler(struct 
scsi_cmnd *scsicmd)
 "cmd %d RESET FAILED, new lockup detected",
 hpsa_get_cmd_index(scsicmd));
hpsa_show_dev_msg(KERN_WARNING, h, dev, msg);
-   return FAILED;
+   rc = FAILED;
+   goto return_reset_status;
}
 
/* Do not attempt on controller */
-   if (is_hba_lunid(dev->scsi3addr))
-   return SUCCESS;
+   if (is_hba_lunid(dev->scsi3addr)) {
+   rc = SUCCESS;
+   goto return_reset_status;
+   }
 
if (is_logical_dev_addr_mode(dev->scsi3addr))
reset_type = HPSA_DEVICE_RESET_MSG;
@@ -5888,17 +5899,22 @@ static int hpsa_eh_device_reset_handler(struct 
scsi_cmnd *scsicmd)
reset_type == HPSA_DEVICE_RESET_MSG ? "logical " : "physical ");
hpsa_show_dev_msg(KERN_WARNING, h, dev, msg);
 
-   h->reset_in_progress = 1;
-
/* send a reset to the SCSI LUN which the command was sent to */
rc = hpsa_do_reset(h, dev, dev->scsi3addr, reset_type,
   DEFAULT_REPLY_QUEUE);
+   if (rc == 0)
+   rc = SUCCESS;
+   else
+   rc = FAILED;
+
sprintf(msg, "reset %s %s",
reset_type == HPSA_DEVICE_RESET_MSG ? "logical " : "physical ",
-   rc == 0 ? "completed successfully" : "failed");
+   rc == SUCCESS ? "completed successfully" : "failed");
hpsa_show_dev_msg(KERN_WARNING, h, dev, msg);
+
+return_reset_status:
h->reset_in_progress = 0;
-   return rc == 0 ? SUCCESS : FAILED;
+   return rc;
 }
 
 static void swizzle_abort_tag(u8 *tag)

[PATCH 00/12] hpsa updates

2017-04-07 Thread Don Brace

These patches are based on Linus's tree

These patches are for: 
 - Multipath failover support in general.

The changes are:
 - update identify physical device structure
   - align with FW
 - stop getting enclosure info for externals
   - no BMIC support
 - update reset handler
   - update to match out of box driver
 - do not reset enclosures
   - reset can sometimes hang
 - rescan later if reset in progress
   - wait for devices to settle.
 - correct resets on retried commands
   - was not calling scsi_done on retried completion
 - correct queue depth for externals
   - Code not in correct function
 - separate monitor events from heartbeat worker
   - allows driver to check for changes more frequently
 without affecting controller lockup detection.
 - send ioaccel requests with 0 length down raid path
   - avoid hang issues for customers running older FW.
 - remove abort handler
   - align driver with our out of box driver
 - bump driver version
   - align version with out of box driver for multi-path changes

---

Don Brace (11):
  hpsa: update identify physical device structure
  hpsa: do not get enclosure info for external devices
  hpsa: update reset handler
  hpsa: do not reset enclosures
  hpsa: rescan later if reset in progress
  hpsa: correct resets on retried commands
  hpsa: cleanup reset handler
  hpsa: correct queue depth for externals
  hpsa: send ioaccel requests with 0 length down raid path
  hpsa: remove abort handler
  hpsa: bump driver version

Scott Teel (1):
  hpsa: separate monitor events from heartbeat worker


 drivers/scsi/hpsa.c |  790 +--
 drivers/scsi/hpsa.h |3 
 drivers/scsi/hpsa_cmd.h |   20 +
 3 files changed, 164 insertions(+), 649 deletions(-)

--
Signature

Re: [PATCH] ibmvscsis: Do not send aborted task response

2017-04-07 Thread Bart Van Assche

On Fri, 2017-04-07 at 16:14 -0500, Michael Cyr wrote:
> That then caused this issue, because release_cmd is always called, even 
> if queue_status is not.  Perhaps it would be cleaner to set some sort of 
> status valid flag during queue_status instead of setting a flag in 
> aborted_task?

Hello Michael,

Thanks for the clarification. Have you already checked whether a new flag
is really needed or whether checking CMD_T_TAS would be sufficient?

Thanks,

Bart.

Re: linux-next: manual merge of the scsi-mkp tree with the char-misc tree

2017-04-07 Thread Logan Gunthorpe

Hi Bart,

On 07/04/17 09:49 AM, Bart Van Assche wrote:
> Sorry that I had not yet noticed Logan's patch series. Should my two
> patches that conflict with Logan's patch series be dropped and reworked
> after Logan's patches are upstream?

Yeah, Greg took my patchset around a few maintainers relatively quickly.
This is the second conflict, so sorry about that. Looks like the easiest
thing would be to just base your change off of mine. It doesn't look too
difficult. If you can do it before my patch hits upstream, I'd
appreciate some testing and/or review as no one from the scsi side
responded and that particular patch was a bit more involved than I would
have liked.

Thanks,

Logan

Re: [PATCH] ibmvscsis: Do not send aborted task response

2017-04-07 Thread Michael Cyr


On 4/7/17 12:01 PM, Bryant G. Ly wrote:



On 4/7/17 11:36 AM, Bart Van Assche wrote:

On Fri, 2017-04-07 at 11:12 -0500, Bryant G. Ly wrote:
So from this stack trace it looks like the ibmvscsis driver is 
sending an

extra response through send_messages even though an abort was issued.
IBMi, reported this issue internally when they were testing the driver,
because their client didn't like getting a response back for the 
aborted op.
They are only expecting a TMR from the abort request, NOT the 
aborted op.

Hello Bryant,

What is the root cause of this behavior? Why is it that the behavior of
the ibmvscsi_tgt contradicts what has been implemented in the LIO core?
Sorry but the patch at the start of this thread looks to me like an
attempt to paper over the problem instead of addressing the root cause.

Thanks,

IBMi clients received a response for an aborted operation, so they 
sent an abort tm request.
Afterwards they get a CRQ response to the op that they aborted. That 
should not happen, because they are only supposed to get a response 
for the tm request NOT the aborted operation.
Looking at the code it looks like it is due to send messages, 
processing a response without checking to see if it was an aborted op.
This patch addresses a bug within the ibmvscsis driver and the fact 
that it SENT a response to the aborted operation(which is wrong since) 
without looking at what LIO core had done.
The driver isn't supposed to send any response to the aborted 
operation, BUT only a response to the abort tm request, which LIO core 
currently does.


-Bryant

I think I can clarify the issue here: ibmvscsis_tgt does not send the 
response to the client until release_cmd time.  The reason for this was 
because if we did it at queue_status time, then the client would be free 
to reuse the tag for that command, but we're still using the tag until 
the command is released at release_cmd time, so we chose to delay 
sending the response until then.


That then caused this issue, because release_cmd is always called, even 
if queue_status is not.  Perhaps it would be cleaner to set some sort of 
status valid flag during queue_status instead of setting a flag in 
aborted_task?

77 matches

Mail list logo