Re: [PATCHv2 4/5] target/user: Fix wrongly calculating of the base_command_size

2017-03-16 Thread Xiubo Li

On 2017年03月08日 16:45, lixi...@cmss.chinamobile.com wrote:

From: Xiubo Li 

The t_data_nents and t_bidi_data_nents are all the numbers of the
segments, and we couldn't be sure the size of the data area block
will equal to size of the segment.

Use the actually block number needed intead of the sum of segments.

Signed-off-by: Xiubo Li 
---
  drivers/target/target_core_user.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
index 99cd239..117be07 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -650,8 +650,7 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
struct tcmu_cmd *cmd,
 * expensive to tell how many regions are freed in the bitmap
*/
base_command_size = max(offsetof(struct tcmu_cmd_entry,
-   req.iov[se_cmd->t_bidi_data_nents +
-   se_cmd->t_data_nents]),
+   req.iov[tcmu_cmd->dbi_len]),

For the old code:

If the segment size is larger than the DATA_BLOCK_SIZE, and at the same 
time all the
data blocks won't be continues between each other(that to say each block 
will use
one iov: iov_cnt == block cnt > nents), the entry's cdb data will 
overlap with entry's

iov[] data.

Thanks,

BRs
Xiubo



sizeof(struct tcmu_cmd_entry));
command_size = base_command_size
+ round_up(scsi_command_size(se_cmd->t_task_cdb), 
TCMU_OP_ALIGN_SIZE);






Re: [ANNOUNCE]: Broadcom (Emulex) FC Target driver - efct

2017-03-16 Thread Bart Van Assche
On Mon, 2017-02-27 at 15:28 -0800, James Smart wrote:
> I'd like to announce the availability of the Broadcom (Emulex) FC Target 
> driver - efct.

Hello James,

Sorry but I have not yet had the time to do a full review. But I would like
to already share the following feedback:
* efct_io_find_tgt_io() looks up an I/O request in a list. The lookup is
  protected by a spinlock but callers are expected to increase the I/O
  request reference count after the lock has been released. This looks wrong
  to me and most likely means that there are race conditions related to I/O
  request lookup. Have you considered to move the kref_get() call into
  efct_io_find_tgt_io() such that it can occur with the spinlock held?
* If an ABTS is received then efct_process_abts() (indirectly) calls
  target_submit_tmr(). That last function may or may not decide to call
  efct_lio_queue_status(), depending on whether or not the TAS bit has been
  set. efct_lio_queue_status() sends back a SCSI response to the initiator
  (see also __transport_check_aborted_status() and
  transport_send_task_abort()). Is it allowed by the FC spec to send back a
  SCSI response after an ABTS has been received? Is that SCSI response sent
  before or after the BLS ABTS response?

Thanks,

Bart.

Re: [PATCH] [SCSI] esas2r: remove redundant NULL check on buffer

2017-03-16 Thread Martin K. Petersen
Colin King  writes:

> buffer is a pointer to the static char array event_buffer and
> therefore can never be null, so the check is redundant. Remove it.

Applied to 4.12/scsi-queue. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [patch] check length passed to SG_NEXT_CMD_LEN

2017-03-16 Thread Martin K. Petersen
Peter Chang  writes:

Applied to 4.11/scsi-fixes.

Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCHv2 5/5] target/user: Clean up tcmu_queue_cmd_ring

2017-03-16 Thread Bryant G. Ly



From: Xiubo Li 

Add two helpers to simplify the code, and move some code out of
the cmdr spin lock to reduce the size of critical region.

Signed-off-by: Xiubo Li 
---
  drivers/target/target_core_user.c | 54 ++-
  1 file changed, 30 insertions(+), 24 deletions(-)

diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
index 117be07..5205d2f 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c






+static inline void tcmu_cmd_get_cmd_size(struct se_cmd *se_cmd,
+   size_t *base_size, size_t *cmd_size)


Should this be tcmu_cmd_get_cmd_size(struct tcmu_cmd *tcmu_cmd instead? Look at 
bottom.


+{
+   struct se_cmd *se_cmd = tcmu_cmd->se_cmd;
+
+   *base_size = max(offsetof(struct tcmu_cmd_entry,
+req.iov[tcmu_cmd->dbi_len]),
+sizeof(struct tcmu_cmd_entry));
+
+   *cmd_size = round_up(scsi_command_size(se_cmd->t_task_cdb),
+   TCMU_OP_ALIGN_SIZE) + *base_size;
+   WARN_ON(*cmd_size & (TCMU_OP_ALIGN_SIZE-1));
+}
+
  static sense_reason_t
  tcmu_queue_cmd_ring(struct tcmu_cmd *tcmu_cmd)
  {
struct tcmu_dev *udev = tcmu_cmd->tcmu_dev;
struct se_cmd *se_cmd = tcmu_cmd->se_cmd;
size_t base_command_size, command_size;
-   struct tcmu_mailbox *mb;
+   struct tcmu_mailbox *mb = udev->mb_addr;
struct tcmu_cmd_entry *entry;
struct iovec *iov;
int iov_cnt, ret;
@@ -642,6 +660,8 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
struct tcmu_cmd *cmd,
if (test_bit(TCMU_DEV_BIT_BROKEN, >flags))
return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;

+   if (se_cmd->se_cmd_flags & SCF_BIDI)
+   BUG_ON(!(se_cmd->t_bidi_data_sg && se_cmd->t_bidi_data_nents));
/*
 * Must be a certain minimum size for response sense info, but
 * also may be larger if the iov array is large.
@@ -649,33 +669,19 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
struct tcmu_cmd *cmd,
 * We prepare way too many iovs for potential uses here, because it's
 * expensive to tell how many regions are freed in the bitmap
*/
-   base_command_size = max(offsetof(struct tcmu_cmd_entry,
-   req.iov[tcmu_cmd->dbi_len]),
-   sizeof(struct tcmu_cmd_entry));
-   command_size = base_command_size
-   + round_up(scsi_command_size(se_cmd->t_task_cdb), 
TCMU_OP_ALIGN_SIZE);
-
-   WARN_ON(command_size & (TCMU_OP_ALIGN_SIZE-1));
-
-   spin_lock_irq(>cmdr_lock);
-
-   mb = udev->mb_addr;
-   cmd_head = mb->cmd_head % udev->cmdr_size; /* UAM */
-   data_length = round_up(se_cmd->data_length, DATA_BLOCK_SIZE);
-   if (se_cmd->se_cmd_flags & SCF_BIDI) {
-   BUG_ON(!(se_cmd->t_bidi_data_sg && se_cmd->t_bidi_data_nents));
-   data_length += round_up(se_cmd->t_bidi_data_sg->length,
-   DATA_BLOCK_SIZE);
-   }
+   tcmu_cmd_get_cmd_size(tcmu_cmd, _command_size, _size);




So Based upon your definition of tcmu_cmd_get_cmd_size, why did you pass in 
tcmu_cmd here? It wont compile in its current state.

-Bryant



Re: [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded

2017-03-16 Thread Bart Van Assche
On Thu, 2017-03-16 at 15:53 -0700, James Bottomley wrote:
> On Thu, 2017-03-16 at 13:56 -0700, Bart Van Assche wrote:
> > scsi_target_unblock() must unblock SCSI devices even if this function
> > is called after unloading of the LLD that created these devices has
> > started. This is necessary to prevent that __scsi_remove_device() 
> > hangs on the SYNCHRONIZE CACHE command issued by the sd driver during
> > shutdown.
> 
> Your special get function misses the try_module_get().  But this is all
> really a bit ugly. Since the only problem is the SYNC CACHE triggered
> by device_del, isn't a better solution a new state: SDEV_CANCEL_BLOCK. 
>  This will make the device visible to scsi_get_device() and we can take
> it back from CANCEL_BLOCKED->CANCEL when the queue is unblocked.  I
> suspect we could also simply throw away the sync cache command when the
> device is blocked (the cache should destage naturally in the time it
> takes for the device to be unblocked).

Hello James,

The purpose of this patch series is to make sure that unblock also occurs
after module unloading has started. My understanding of try_module_get() is
that it fails once module unloading has started. In other words, it is on
purpose that try_module_get() is not called. From the kernel module code:

bool try_module_get(struct module *module)
{
bool ret = true;

if (module) {
preempt_disable();
/* Note: here, we can fail to get a reference */
if (likely(module_is_live(module) &&
   atomic_inc_not_zero(>refcnt) != 0))
trace_module_get(module, _RET_IP_);
else
ret = false;
preempt_enable();
}
return ret;
}

static inline int module_is_live(struct module *mod)
{
return mod->state != MODULE_STATE_GOING;
}

Regarding introducing a new device state: this is something I would like to
avoid. Any code that manipulates the SCSI device state is unnecessarily hard
to modify because multiple types of state information have been mixed up in
a single state variable (blocked / not blocked; created / running / cancel /
offline). Additionally, the SCSI device state is visible to user space.
Adding a new SCSI device state could break existing user space applications.

There is another problem with the introduction of the SDEV_CANCEL_BLOCKED
state: we do not want open() calls to succeed for devices that are in the
SDEV_DEL, SDEV_CANCEL nor for devices that are in the SDEV_CANCEL_BLOCKED
state. scsi_disk_get() in the sd driver relies on scsi_device_get() to check
the SCSI device state. If scsi_device_get() would succeed for devices in the
SDEV_CANCEL_BLOCKED state then an explicit check for that state would have
to be added in several users of scsi_device_get(). In other words, I think
adding the SDEV_CANCEL_BLOCKED state would result in a much more complex and
also harder to test patch.

Thanks,

Bart.

Re: [PATCH v2] scsi_sysfs: fix hang when removing scsi device

2017-03-16 Thread Bart Van Assche
On Mon, 2017-03-13 at 14:55 -0700, James Bottomley wrote:
> On Mon, 2017-03-13 at 20:33 +, Bart Van Assche wrote:
> > On Mon, 2017-03-13 at 12:23 -0700, James Bottomley wrote:
> > > On Mon, 2017-03-13 at 18:49 +, Bart Van Assche wrote:
> > > > diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> > > > index 7bfbcfa7af40..b3bb49d06943 100644
> > > > --- a/drivers/scsi/scsi.c
> > > > +++ b/drivers/scsi/scsi.c
> > > > @@ -602,7 +602,7 @@ EXPORT_SYMBOL(scsi_device_get);
> > > >   */
> > > >  void scsi_device_put(struct scsi_device *sdev)
> > > >  {
> > > > -   module_put(sdev->host->hostt->module);
> > > > +   module_put(sdev->hostt->module);
> > > > put_device(>sdev_gendev);
> > > >  }
> > > >  EXPORT_SYMBOL(scsi_device_put);
> > > > diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> > > > index 6f7128f49c30..7134487abbb1 100644
> > > > --- a/drivers/scsi/scsi_scan.c
> > > > +++ b/drivers/scsi/scsi_scan.c
> > > > @@ -227,6 +227,7 @@ static struct scsi_device
> > > > *scsi_alloc_sdev(struct
> > > > scsi_target *starget,
> > > > sdev->model = scsi_null_device_strs;
> > > > sdev->rev = scsi_null_device_strs;
> > > > sdev->host = shost;
> > > > +   sdev->hostt = shost->hostt;
> > > > sdev->queue_ramp_up_period = SCSI_DEFAULT_RAMP_UP_PERIOD;
> > > > sdev->id = starget->id;
> > > > sdev->lun = lun;
> > > > diff --git a/include/scsi/scsi_device.h
> > > > b/include/scsi/scsi_device.h
> > > > index 6f22b39f1b0c..cda620ed5922 100644
> > > > --- a/include/scsi/scsi_device.h
> > > > +++ b/include/scsi/scsi_device.h
> > > > @@ -82,6 +82,7 @@ struct scsi_event {
> > > >  
> > > >  struct scsi_device {
> > > > struct Scsi_Host *host;
> > > > +   struct scsi_host_template *hostt;
> > > > struct request_queue *request_queue;
> > > >  
> > > 
> > > The apparent assumption behind this patch is that sdev->host can be
> > > freed but the sdev will still exist?  That shouldn't be correct:
> > > the
> > > rule for struct devices is that the child always holds the parent
> > > and
> > > the host is parented (albeit not necessarily directly) to the sdev,
> > > so
> > > it looks like something has gone wrong if the host had been freed
> > > before the sdev.
> > 
> > Hello James,
> > 
> > scsi_remove_host() decreases the sdev reference count but does not 
> > wait until the sdev release work has finished. This is why the SCSI
> > host can already have disappeared before the last scsi_device_put()
> > call occurs.
> 
> This is true, but I don't see how it can cause the host to be freed
> before the sdev.  The memory for struct Scsi_Host is freed in the
> shost_gendev release routine, which should be pinned by the parent
> traversal from sdev.  So it should not be possible for
>  scsi_host_dev_release() to be called before
>  scsi_device_dev_release_usercontext() becase the latter has the final
> put of the parent device.

Hello Hannes,

The following crash only occurs with async aborts enabled:

general protection fault:  [#1] SMP
RIP: 0010:scsi_device_put+0xb/0x30
Call Trace:
 scsi_disk_put+0x2d/0x40
 sd_release+0x3d/0xb0
 __blkdev_put+0x29e/0x360
 blkdev_put+0x49/0x170
 dm_put_table_device+0x58/0xc0 [dm_mod]
 dm_put_device+0x70/0xc0 [dm_mod]
 free_priority_group+0x92/0xc0 [dm_multipath]
 free_multipath+0x70/0xc0 [dm_multipath]
 multipath_dtr+0x19/0x20 [dm_multipath]
 dm_table_destroy+0x67/0x120 [dm_mod]
 dev_suspend+0xde/0x240 [dm_mod]
 ctl_ioctl+0x1f5/0x520 [dm_mod]
 dm_ctl_ioctl+0xe/0x20 [dm_mod]
 do_vfs_ioctl+0x8f/0x700
 SyS_ioctl+0x3c/0x70
 entry_SYSCALL_64_fastpath+0x18/0xad

I hadn't seen this crash before kernel v4.11-rc1 but with kernel v4.11-rc1
and later I see it if I let the srp-test scripts run for a few minutes. The
patch I used to disable async aborts on my test setup is as follows:

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index f2cafae150bc..9075e126d6c8 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -171,6 +171,7 @@ scmd_eh_abort_handler(struct work_struct *work)
}
 }
 
+#if 0
 /**
  * scsi_abort_command - schedule a command abort
  * @scmd:  scmd to abort.
@@ -219,6 +220,12 @@ scsi_abort_command(struct scsi_cmnd *scmd)
queue_delayed_work(shost->tmf_work_q, >abort_work, HZ / 100);
return SUCCESS;
 }
+#else
+static int scsi_abort_command(struct scsi_cmnd *scmd)
+{
+   return FAILED;
+}
+#endif
 
 /**
  * scsi_eh_scmd_add - add scsi cmd to error handling.

Bart.


Re: [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded

2017-03-16 Thread James Bottomley
On Thu, 2017-03-16 at 13:56 -0700, Bart Van Assche wrote:
> scsi_target_unblock() must unblock SCSI devices even if this function
> is called after unloading of the LLD that created these devices has
> started. This is necessary to prevent that __scsi_remove_device() 
> hangs on the SYNCHRONIZE CACHE command issued by the sd driver during
> shutdown.

Your special get function misses the try_module_get().  But this is all
really a bit ugly. Since the only problem is the SYNC CACHE triggered
by device_del, isn't a better solution a new state: SDEV_CANCEL_BLOCK. 
 This will make the device visible to scsi_get_device() and we can take
it back from CANCEL_BLOCKED->CANCEL when the queue is unblocked.  I
suspect we could also simply throw away the sync cache command when the
device is blocked (the cache should destage naturally in the time it
takes for the device to be unblocked).

James



Re: [PATCH] [SCSI] esas2r: remove redundant NULL check on buffer

2017-03-16 Thread Bradley Grove

On 03/15/2017 12:46 PM, Colin King wrote:

From: Colin Ian King 

buffer is a pointer to the static char array event_buffer and
therefore can never be null, so the check is redundant. Remove it.

Detected by CoverityScan, CID#1077556 ("Logically Dead Code")

Signed-off-by: Colin Ian King 
---
 drivers/scsi/esas2r/esas2r_log.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/scsi/esas2r/esas2r_log.c b/drivers/scsi/esas2r/esas2r_log.c
index a82030a..65fdf22 100644
--- a/drivers/scsi/esas2r/esas2r_log.c
+++ b/drivers/scsi/esas2r/esas2r_log.c
@@ -130,11 +130,6 @@ static int esas2r_log_master(const long level,

spin_lock_irqsave(_buffer_lock, flags);

-   if (buffer == NULL) {
-   spin_unlock_irqrestore(_buffer_lock, flags);
-   return -1;
-   }
-
memset(buffer, 0, buflen);

/*



Thanks,

Acked-by: Bradley Grove 



This electronic transmission and any attachments hereto are intended only for the use of the individual or entity to which it is addressed and may contain confidential information belonging to ATTO Technology, Inc. If you have reason to believe that you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or the taking of any action in reliance on the contents of this electronic transmission is strictly prohibited. If you have reason to believe that you have received this transmission in error, please notify ATTO immediately by return e-mail and delete and destroy this communication.   


Re: [RFC 1/2] scsi: Provide mechanism for SCSI layer to poll for LLDD.

2017-03-16 Thread Bart Van Assche
On Thu, 2017-03-16 at 14:40 -0700, Himanshu Madhani wrote:
> +static int
> +scsi_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
> +
> +{
> +struct Scsi_Host *shost = hctx->driver_data;
> +struct request *req;
> +struct scsi_cmnd *cmd;
> +
> +req = blk_mq_tag_to_rq(shost->tag_set.tags[hctx->queue_num], tag);
> +if (!req)
> +return 0;
> +
> +cmd = blk_mq_rq_to_pdu(req);
> + if (!cmd)
> + return 0;
> +
> + if (shost->hostt->poll_queue)
> + return(shost->hostt->poll_queue(shost, cmd));
> + else return 0;
> +}

What will happen if the request with tag 'tag' is completed after the block
layer decided to call this function and before this function had the chance
to call blk_mq_tag_to_rq()? Will this trigger a kernel crash? Also, please
format patches properly. This is what checkpatch reports for this patch:

ERROR: code indent should use tabs where possible
#244: FILE: drivers/scsi/scsi_lib.c:2161:
+struct Scsi_Host *shost = hctx->driver_data;$

WARNING: please, no spaces at the start of a line
#244: FILE: drivers/scsi/scsi_lib.c:2161:
+struct Scsi_Host *shost = hctx->driver_data;$

ERROR: code indent should use tabs where possible
#245: FILE: drivers/scsi/scsi_lib.c:2162:
+struct request *req;$

WARNING: please, no spaces at the start of a line
#245: FILE: drivers/scsi/scsi_lib.c:2162:
+struct request *req;$

ERROR: code indent should use tabs where possible
#246: FILE: drivers/scsi/scsi_lib.c:2163:
+struct scsi_cmnd *cmd;$

WARNING: please, no spaces at the start of a line
#246: FILE: drivers/scsi/scsi_lib.c:2163:
+struct scsi_cmnd *cmd;$

ERROR: code indent should use tabs where possible
#248: FILE: drivers/scsi/scsi_lib.c:2165:
+req = blk_mq_tag_to_rq(shost->tag_set.tags[hctx->queue_num], tag);$

WARNING: please, no spaces at the start of a line
#248: FILE: drivers/scsi/scsi_lib.c:2165:
+req = blk_mq_tag_to_rq(shost->tag_set.tags[hctx->queue_num], tag);$

ERROR: code indent should use tabs where possible
#249: FILE: drivers/scsi/scsi_lib.c:2166:
+if (!req)$

WARNING: please, no spaces at the start of a line
#249: FILE: drivers/scsi/scsi_lib.c:2166:
+if (!req)$

ERROR: code indent should use tabs where possible
#250: FILE: drivers/scsi/scsi_lib.c:2167:
+return 0;$

WARNING: please, no spaces at the start of a line
#250: FILE: drivers/scsi/scsi_lib.c:2167:
+return 0;$

ERROR: code indent should use tabs where possible
#252: FILE: drivers/scsi/scsi_lib.c:2169:
+cmd = blk_mq_rq_to_pdu(req);$

WARNING: please, no spaces at the start of a line
#252: FILE: drivers/scsi/scsi_lib.c:2169:
+cmd = blk_mq_rq_to_pdu(req);$

ERROR: return is not a function, parentheses are not required
#257: FILE: drivers/scsi/scsi_lib.c:2174:
+   return(shost->hostt->poll_queue(shost, cmd));

ERROR: trailing statements should be on next line
#258: FILE: drivers/scsi/scsi_lib.c:2175:
+   else return 0;

WARNING: line over 80 characters
#262: FILE: drivers/scsi/scsi_lib.c:2179:
+__scsi_init_hctx(struct blk_mq_hw_ctx *hctx, struct Scsi_Host *shost, unsigned 
int hctx_idx)

WARNING: Unnecessary space before function pointer name
#306: FILE: include/scsi/scsi_host.h:139:
+   int (* poll_queue)(struct Scsi_Host *, struct scsi_cmnd *);

ERROR: space prohibited after that '*' (ctx:BxW)
#306: FILE: include/scsi/scsi_host.h:139:
+   int (* poll_queue)(struct Scsi_Host *, struct scsi_cmnd *);
     ^
Thanks,

Bart.

Re: [RFC 1/2] scsi: Provide mechanism for SCSI layer to poll for LLDD.

2017-03-16 Thread Madhani, Himanshu

> On Mar 16, 2017, at 3:27 PM, Bart Van Assche  
> wrote:
> 
> On Thu, 2017-03-16 at 14:40 -0700, Himanshu Madhani wrote:
>> +static int
>> +scsi_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
>> +
>> +{
>> +struct Scsi_Host *shost = hctx->driver_data;
>> +struct request *req;
>> +struct scsi_cmnd *cmd;
>> +
>> +req = blk_mq_tag_to_rq(shost->tag_set.tags[hctx->queue_num], tag);
>> +if (!req)
>> +return 0;
>> +
>> +cmd = blk_mq_rq_to_pdu(req);
>> +if (!cmd)
>> +return 0;
>> +
>> +if (shost->hostt->poll_queue)
>> +return(shost->hostt->poll_queue(shost, cmd));
>> +else return 0;
>> +}
> 
> What will happen if the request with tag 'tag' is completed after the block
> layer decided to call this function and before this function had the chance
> to call blk_mq_tag_to_rq()? Will this trigger a kernel crash? Also, please
> format patches properly. This is what checkpatch reports for this patch:
> 
> ERROR: code indent should use tabs where possible
> #244: FILE: drivers/scsi/scsi_lib.c:2161:
> +struct Scsi_Host *shost = hctx->driver_data;$
> 
> WARNING: please, no spaces at the start of a line
> #244: FILE: drivers/scsi/scsi_lib.c:2161:
> +struct Scsi_Host *shost = hctx->driver_data;$
> 
> ERROR: code indent should use tabs where possible
> #245: FILE: drivers/scsi/scsi_lib.c:2162:
> +struct request *req;$
> 
> WARNING: please, no spaces at the start of a line
> #245: FILE: drivers/scsi/scsi_lib.c:2162:
> +struct request *req;$
> 
> ERROR: code indent should use tabs where possible
> #246: FILE: drivers/scsi/scsi_lib.c:2163:
> +struct scsi_cmnd *cmd;$
> 
> WARNING: please, no spaces at the start of a line
> #246: FILE: drivers/scsi/scsi_lib.c:2163:
> +struct scsi_cmnd *cmd;$
> 
> ERROR: code indent should use tabs where possible
> #248: FILE: drivers/scsi/scsi_lib.c:2165:
> +req = blk_mq_tag_to_rq(shost->tag_set.tags[hctx->queue_num], tag);$
> 
> WARNING: please, no spaces at the start of a line
> #248: FILE: drivers/scsi/scsi_lib.c:2165:
> +req = blk_mq_tag_to_rq(shost->tag_set.tags[hctx->queue_num], tag);$
> 
> ERROR: code indent should use tabs where possible
> #249: FILE: drivers/scsi/scsi_lib.c:2166:
> +if (!req)$
> 
> WARNING: please, no spaces at the start of a line
> #249: FILE: drivers/scsi/scsi_lib.c:2166:
> +if (!req)$
> 
> ERROR: code indent should use tabs where possible
> #250: FILE: drivers/scsi/scsi_lib.c:2167:
> +return 0;$
> 
> WARNING: please, no spaces at the start of a line
> #250: FILE: drivers/scsi/scsi_lib.c:2167:
> +return 0;$
> 
> ERROR: code indent should use tabs where possible
> #252: FILE: drivers/scsi/scsi_lib.c:2169:
> +cmd = blk_mq_rq_to_pdu(req);$
> 
> WARNING: please, no spaces at the start of a line
> #252: FILE: drivers/scsi/scsi_lib.c:2169:
> +cmd = blk_mq_rq_to_pdu(req);$
> 
> ERROR: return is not a function, parentheses are not required
> #257: FILE: drivers/scsi/scsi_lib.c:2174:
> + return(shost->hostt->poll_queue(shost, cmd));
> 
> ERROR: trailing statements should be on next line
> #258: FILE: drivers/scsi/scsi_lib.c:2175:
> + else return 0;
> 
> WARNING: line over 80 characters
> #262: FILE: drivers/scsi/scsi_lib.c:2179:
> +__scsi_init_hctx(struct blk_mq_hw_ctx *hctx, struct Scsi_Host *shost, 
> unsigned int hctx_idx)
> 
> WARNING: Unnecessary space before function pointer name
> #306: FILE: include/scsi/scsi_host.h:139:
> + int (* poll_queue)(struct Scsi_Host *, struct scsi_cmnd *);
> 
> ERROR: space prohibited after that '*' (ctx:BxW)
> #306: FILE: include/scsi/scsi_host.h:139:
> + int (* poll_queue)(struct Scsi_Host *, struct scsi_cmnd *);
>^
> Thanks,
> 
> Bart.

We’ll take care of Check patch issue in updated RFC. 

Thanks,
- Himanshu



[RFC 1/2] scsi: Provide mechanism for SCSI layer to poll for LLDD.

2017-03-16 Thread Himanshu Madhani
From: Darren Trapp 

Signed-off-by: Darren Trapp 
Signed-off-by: Himanshu Madhani 
---
 drivers/scsi/scsi_lib.c  | 39 +++
 include/scsi/scsi_host.h | 12 
 2 files changed, 51 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 19125d72f322..eb01be039677 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2154,6 +2154,43 @@ struct request_queue *scsi_alloc_queue(struct 
scsi_device *sdev)
return q;
 }
 
+static int
+scsi_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
+
+{
+struct Scsi_Host *shost = hctx->driver_data;
+struct request *req;
+struct scsi_cmnd *cmd;
+
+req = blk_mq_tag_to_rq(shost->tag_set.tags[hctx->queue_num], tag);
+if (!req)
+return 0;
+
+cmd = blk_mq_rq_to_pdu(req);
+   if (!cmd)
+   return 0;
+
+   if (shost->hostt->poll_queue)
+   return(shost->hostt->poll_queue(shost, cmd));
+   else return 0;
+}
+
+static inline void
+__scsi_init_hctx(struct blk_mq_hw_ctx *hctx, struct Scsi_Host *shost, unsigned 
int hctx_idx)
+{
+   hctx->driver_data = shost;
+}
+
+static int
+scsi_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, unsigned int hctx_idx)
+{
+   struct Scsi_Host *shost = data;
+
+   __scsi_init_hctx(hctx, shost, hctx_idx);
+
+   return 0;
+}
+
 static struct blk_mq_ops scsi_mq_ops = {
.queue_rq   = scsi_queue_rq,
.complete   = scsi_softirq_done,
@@ -2161,6 +2198,8 @@ static struct blk_mq_ops scsi_mq_ops = {
.init_request   = scsi_init_request,
.exit_request   = scsi_exit_request,
.map_queues = scsi_map_queues,
+   .poll   = scsi_poll,
+   .init_hctx  = scsi_init_hctx,
 };
 
 struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev)
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 3cd8c3bec638..e87e8a69a0df 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -127,6 +127,18 @@ struct scsi_host_template {
int (* queuecommand)(struct Scsi_Host *, struct scsi_cmnd *);
 
/*
+* The poll_queue function allows the scsi layer to poll a
+* LLDD for the completion of a specific scsi_cmnd (upon request
+* from blk_mq).
+*
+* The LLDD returns 1 to indicate it completed the request or 0
+* otherwise.
+*
+* STATUS: OPTIONAL
+*/
+   int (* poll_queue)(struct Scsi_Host *, struct scsi_cmnd *);
+
+   /*
 * This is an error handling strategy routine.  You don't need to
 * define one of these if you don't want to - there is a default
 * routine that is present that should work in most cases.  For those
-- 
2.12.0



[RFC 2/2] qla2xxx: Driver changes to use new poll_queue() callback

2017-03-16 Thread Himanshu Madhani
From: Darren Trapp 

The blk_mq layer allows polling a transport for a specific
completion if the HIPRI request flag is set. This can be
accomplished by using "libengine=pvsync2 –hipri" option
in FIO tool. This flag then allows polling into LLDD to
service specific compeletion queue for IOs. It also removes
the interrupt latency and significantly improved performance
in NVMeF that utilizes it. There is expense of higher CPU
utlization due to active polling.

Here's sample results for comparision

FIO command line with synchronous IO’s results in latency avg=32.73us
reported by FIO

./fio/fio  --time_based --ioengine=sync --direct=1 --runtime=30 
--readwrite=read \
  --iodepth=1 --blocksize=4k --name=job0 --group_reporting --filename=/dev/sdd

FIO command line with active polling IO’s results in latency avg=26.04 reported 
by FIO

./fio/fio  --time_based --ioengine=pvsync2 --hipri --direct=1 --runtime=30 \
  --readwrite=read --iodepth=1 --blocksize=4k --name=job0 --group_reporting \
  --filename=/dev/sdd

FIO version 2.17

Signed-off-by: Darren Trapp 
Signed-off-by: Himanshu Madhani 
---
 drivers/scsi/qla2xxx/qla_os.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 41d5b09f7326..1f3a113954fd 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -253,6 +253,7 @@ static int qla2xxx_scan_finished(struct Scsi_Host *, 
unsigned long time);
 static void qla2xxx_scan_start(struct Scsi_Host *);
 static void qla2xxx_slave_destroy(struct scsi_device *);
 static int qla2xxx_queuecommand(struct Scsi_Host *h, struct scsi_cmnd *cmd);
+static int qla2xxx_poll_queue(struct Scsi_Host *, struct scsi_cmnd *);
 static int qla2xxx_eh_abort(struct scsi_cmnd *);
 static int qla2xxx_eh_device_reset(struct scsi_cmnd *);
 static int qla2xxx_eh_target_reset(struct scsi_cmnd *);
@@ -268,6 +269,7 @@ struct scsi_host_template qla2xxx_driver_template = {
.module = THIS_MODULE,
.name   = QLA2XXX_DRIVER_NAME,
.queuecommand   = qla2xxx_queuecommand,
+   .poll_queue = qla2xxx_poll_queue,
 
.eh_timed_out   = fc_eh_timed_out,
.eh_abort_handler   = qla2xxx_eh_abort,
@@ -973,6 +975,40 @@ qla2xxx_mqueuecommand(struct Scsi_Host *host, struct 
scsi_cmnd *cmd,
 }
 
 /*
+ * Poll for command to be completed. Initiated by SCSI layer.
+ */
+static int
+qla2xxx_poll_queue(struct Scsi_Host *host, struct scsi_cmnd *cmd)
+{
+   uint32_t tag;
+   uint16_t hwq;
+   struct qla_qpair *qpair;
+   unsigned long flags;
+   scsi_qla_host_t *vha = shost_priv(host);
+   struct qla_hw_data *ha = vha->hw;
+   int ret = 0;
+
+   tag = blk_mq_unique_tag(cmd->request);
+   hwq = blk_mq_unique_tag_to_hwq(tag);
+   qpair = ha->queue_pair_map[hwq];
+
+   /* Acquire  ring specific lock */
+   spin_lock_irqsave(>qp_lock, flags);
+   if (!CMD_SP(cmd)) {
+   spin_unlock_irqrestore(>qp_lock, flags);
+   return 1;
+   }
+
+   qla24xx_process_response_queue(vha, qpair->rsp);
+
+   if (!CMD_SP(cmd))
+   ret = 1;
+
+   spin_unlock_irqrestore(>qp_lock, flags);
+   return ret;
+}
+
+/*
  * qla2x00_eh_wait_on_command
  *Waits for the command to be returned by the Firmware for some
  *max time.
-- 
2.12.0



[RFC 0/2] scsi: Add poll_queue mechanism

2017-03-16 Thread Himanshu Madhani
Hi James/Martin,

This is RFC to get comment about propose poll_queue interface in scsi layer.

The blk_mq layer allows polling a transport for a specific completion if
the HIPRI request flag is set. This can be accomplished by using
"libengine=pvsync2 –hipri" option in FIO tool. This flag then allows
polling into LLDD to service specific compeletion queue for IOs. It also
removes the interrupt latency and significantly improved performance in NVMeF 
that utilizes it. There is expense of higher CPU utlization due to active 
polling.

Here's sample results for comparision

FiO command line with synchronous IO’s results in latency avg=32.73us reported 
by FIO

./fio/fio  --time_based --ioengine=sync --direct=1 --runtime=30 
--readwrite=read \
--iodepth=1 --blocksize=4k --name=job0 --group_reporting --filename=/dev/sdd

FIO command line with active polling IO’s results in latency avg=26.04 reported 
by FIO

./fio/fio  --time_based --ioengine=pvsync2 --hipri --direct=1 --runtime=30 \
--readwrite=read --iodepth=1 --blocksize=4k --name=job0 --group_reporting \
--filename=/dev/sdd

FIO version 2.17

Thanks,
Himanshu

Darren Trapp (2):
  scsi: Provide mechanism for SCSI layer to poll for LLDD.
  qla2xxx: Driver changes to use new poll_queue() callback

 drivers/scsi/qla2xxx/qla_os.c | 36 
 drivers/scsi/scsi_lib.c   | 39 +++
 include/scsi/scsi_host.h  | 12 
 3 files changed, 87 insertions(+)

-- 
2.12.0



[PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded

2017-03-16 Thread Bart Van Assche
scsi_target_unblock() must unblock SCSI devices even if this function
is called after unloading of the LLD that created these devices has
started. This is necessary to prevent that __scsi_remove_device() hangs
on the SYNCHRONIZE CACHE command issued by the sd driver during shutdown.

Bart Van Assche (3):
  __scsi_iterate_devices(): Make the get and put functions arguments
  Introduce starget_for_all_devices() and shost_for_all_devices()
  Ensure that scsi_target_unblock() examines all devices

 drivers/scsi/scsi.c| 60 +-
 drivers/scsi/scsi_lib.c| 22 ++---
 include/scsi/scsi_device.h | 29 ++
 3 files changed, 97 insertions(+), 14 deletions(-)

-- 
2.12.0



[PATCH 2/3] Introduce starget_for_all_devices() and shost_for_all_devices()

2017-03-16 Thread Bart Van Assche
This patch does not change any functionality.

Signed-off-by: Bart Van Assche 
Tested-by: Israel Rukshin 
Cc: Max Gurtovoy 
Cc: Hannes Reinecke 
Cc: 
---
 drivers/scsi/scsi.c| 52 +++---
 include/scsi/scsi_device.h | 22 ++--
 2 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 5ac16fecbdab..ea408da8de29 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -640,9 +640,11 @@ EXPORT_SYMBOL(__scsi_iterate_devices);
  * @data:  Opaque passed to each function call.
  * @fn:Function to call on each device
  *
- * This traverses over each device of @starget.  The devices have
- * a reference that must be released by scsi_host_put when breaking
- * out of the loop.
+ * This traverses over each device of @starget except the devices that are in
+ * state SDEV_DEL or SDEV_CANCEL. The devices have a reference that must be
+ * released by scsi_device_put() when breaking out of the loop. If the LLD
+ * associated with the devices is being unloaded, @fn is not called for any
+ * device.
  */
 void starget_for_each_device(struct scsi_target *starget, void *data,
 void (*fn)(struct scsi_device *, void *))
@@ -659,6 +661,50 @@ void starget_for_each_device(struct scsi_target *starget, 
void *data,
 EXPORT_SYMBOL(starget_for_each_device);
 
 /**
+ * scsi_device_get_any() - get a reference to @sdev even if it is being deleted
+ *
+ * See also scsi_device_get().
+ */
+static int scsi_device_get_any(struct scsi_device *sdev)
+{
+   return get_device(>sdev_gendev) ? 0 : -ENXIO;
+}
+
+/**
+ * scsi_device_put_any() - drop a reference obtained by scsi_device_get_any()
+ *
+ * See also scsi_device_put().
+ */
+static void scsi_device_put_any(struct scsi_device *sdev)
+{
+   put_device(>sdev_gendev);
+}
+
+/**
+ * starget_for_all_devices - helper to walk all devices of a target
+ * @starget:   target whose devices we want to iterate over.
+ * @data:  Pointer passed to each function call.
+ * @fn:Function to call on each device
+ *
+ * This traverses over each device of @starget, including the devices in state
+ * SDEV_DEL or SDEV_ANY. The devices have a reference that must be released by
+ * scsi_device_put_any() when breaking out of the loop.
+ */
+void starget_for_all_devices(struct scsi_target *starget, void *data,
+void (*fn)(struct scsi_device *, void *))
+{
+   struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
+   struct scsi_device *sdev;
+
+   shost_for_all_devices(sdev, shost, scsi_device_get_any,
+ scsi_device_put_any)
+   if (sdev->channel == starget->channel &&
+   sdev->id == starget->id)
+   fn(sdev, data);
+}
+EXPORT_SYMBOL(starget_for_all_devices);
+
+/**
  * __starget_for_each_device - helper to walk all devices of a target 
(UNLOCKED)
  * @starget:   target whose devices we want to iterate over.
  * @data:  parameter for callback @fn()
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 434b617c9f76..cd6a5383e9b7 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -328,6 +328,8 @@ extern struct scsi_device 
*__scsi_device_lookup_by_target(struct scsi_target *,
  u64);
 extern void starget_for_each_device(struct scsi_target *, void *,
 void (*fn)(struct scsi_device *, void *));
+extern void starget_for_all_devices(struct scsi_target *, void *,
+   void (*fn)(struct scsi_device *, void *));
 extern void __starget_for_each_device(struct scsi_target *, void *,
  void (*fn)(struct scsi_device *,
 void *));
@@ -339,6 +341,22 @@ extern struct scsi_device *__scsi_iterate_devices(struct 
Scsi_Host *,
void (*put)(struct scsi_device *));
 
 /**
+ * shost_for_all_devices - iterate over all devices of a host
+ * @sdev: the  scsi_device to use as a cursor
+ * @shost: the  scsi_host to iterate over
+ * @get: function that obtains a reference to a device and returns 0 upon
+ *   success
+ * @put: function that drops a device reference.
+ *
+ * Iterator that returns each device attached to @shost.  This loop
+ * takes a reference on each device and releases it at the end.  If
+ * you break out of the loop, you must call @put(sdev).
+ */
+#define shost_for_all_devices(sdev, shost, get, put)   \
+   for ((sdev) = NULL; ((sdev) = __scsi_iterate_devices((shost), (sdev), \
+(get), (put))) != NULL; )
+
+/**
  * shost_for_each_device - iterate over all devices of a host
  * @sdev: the  scsi_device to 

[PATCH 3/3] Ensure that scsi_target_unblock() examines all devices

2017-03-16 Thread Bart Van Assche
Since scsi_target_unblock() uses starget_for_each_device(), since
starget_for_each_device() uses scsi_device_get(), since
scsi_device_get() fails after unloading of the LLD kernel module
has been started scsi_target_unblock() may skip devices that were
affected by scsi_target_block(). Ensure that scsi_target_block()
examines all SCSI devices. This patch avoids that unloading the
ib_srp kernel module can trigger the following hang:

Call Trace:
 schedule+0x35/0x80
 schedule_timeout+0x237/0x2d0
 io_schedule_timeout+0xa6/0x110
 wait_for_completion_io+0xa3/0x110
 blk_execute_rq+0xdf/0x120
 scsi_execute+0xce/0x150 [scsi_mod]
 scsi_execute_req_flags+0x8f/0xf0 [scsi_mod]
 sd_sync_cache+0xa9/0x190 [sd_mod]
 sd_shutdown+0x6a/0x100 [sd_mod]
 sd_remove+0x64/0xc0 [sd_mod]
 __device_release_driver+0x8d/0x120
 device_release_driver+0x1e/0x30
 bus_remove_device+0xf9/0x170
 device_del+0x127/0x240
 __scsi_remove_device+0xc1/0xd0 [scsi_mod]
 scsi_forget_host+0x57/0x60 [scsi_mod]
 scsi_remove_host+0x72/0x110 [scsi_mod]
 srp_remove_work+0x8b/0x200 [ib_srp]

Reported-by: Israel Rukshin 
Signed-off-by: Bart Van Assche 
Tested-by: Israel Rukshin 
Cc: Max Gurtovoy 
Cc: Hannes Reinecke 
Cc: 
---
 drivers/scsi/scsi_lib.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 19125d72f322..f03a7867c04f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3071,21 +3071,37 @@ device_unblock(struct scsi_device *sdev, void *data)
scsi_internal_device_unblock(sdev, *(enum scsi_device_state *)data);
 }
 
+/**
+ * target_unblock() - unblock all devices associated with a SCSI target
+ *
+ * Notes:
+ * - Do not use scsi_device_get() nor any of the macros that use this
+ *   function from inside scsi_target_block() because otherwise this function
+ *   won't have any effect when called while the SCSI LLD is being unloaded.
+ * - Do not hold the host lock around the device_unblock() calls because at
+ *   least for blk-sq the block layer queue lock is the outer lock and the
+ *   SCSI host lock is the inner lock.
+ */
 static int
 target_unblock(struct device *dev, void *data)
 {
if (scsi_is_target_device(dev))
-   starget_for_each_device(to_scsi_target(dev), data,
+   starget_for_all_devices(to_scsi_target(dev), data,
device_unblock);
return 0;
 }
 
+/**
+ * scsi_target_unblock() - unblock all devices associated with a SCSI target
+ * @dev: Either a pointer to the dev member of struct scsi_target or a pointer
+ * to the shost_gendev member of struct Scsi_Host.
+ * @new_state: New SCSI device state.
+ */
 void
 scsi_target_unblock(struct device *dev, enum scsi_device_state new_state)
 {
if (scsi_is_target_device(dev))
-   starget_for_each_device(to_scsi_target(dev), _state,
-   device_unblock);
+   target_unblock(dev, _state);
else
device_for_each_child(dev, _state, target_unblock);
 }
-- 
2.12.0



[PATCH 1/3] __scsi_iterate_devices(): Make the get and put functions arguments

2017-03-16 Thread Bart Van Assche
This patch does not change any functionality.

Signed-off-by: Bart Van Assche 
Tested-by: Israel Rukshin 
Cc: Max Gurtovoy 
Cc: Hannes Reinecke 
Cc: 
---
 drivers/scsi/scsi.c|  8 +---
 include/scsi/scsi_device.h | 11 ++-
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 7bfbcfa7af40..5ac16fecbdab 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -609,7 +609,9 @@ EXPORT_SYMBOL(scsi_device_put);
 
 /* helper for shost_for_each_device, see that for documentation */
 struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
-  struct scsi_device *prev)
+  struct scsi_device *prev,
+  int (*get)(struct scsi_device *),
+  void (*put)(struct scsi_device *))
 {
struct list_head *list = (prev ? >siblings : >__devices);
struct scsi_device *next = NULL;
@@ -619,7 +621,7 @@ struct scsi_device *__scsi_iterate_devices(struct Scsi_Host 
*shost,
while (list->next != >__devices) {
next = list_entry(list->next, struct scsi_device, siblings);
/* skip devices that we can't get a reference to */
-   if (!scsi_device_get(next))
+   if (!get(next))
break;
next = NULL;
list = list->next;
@@ -627,7 +629,7 @@ struct scsi_device *__scsi_iterate_devices(struct Scsi_Host 
*shost,
spin_unlock_irqrestore(shost->host_lock, flags);
 
if (prev)
-   scsi_device_put(prev);
+   put(prev);
return next;
 }
 EXPORT_SYMBOL(__scsi_iterate_devices);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 080c7ce9bae8..434b617c9f76 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -334,7 +334,9 @@ extern void __starget_for_each_device(struct scsi_target *, 
void *,
 
 /* only exposed to implement shost_for_each_device */
 extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
- struct scsi_device *);
+   struct scsi_device *,
+   int (*get)(struct scsi_device *),
+   void (*put)(struct scsi_device *));
 
 /**
  * shost_for_each_device - iterate over all devices of a host
@@ -345,10 +347,9 @@ extern struct scsi_device *__scsi_iterate_devices(struct 
Scsi_Host *,
  * takes a reference on each device and releases it at the end.  If
  * you break out of the loop, you must call scsi_device_put(sdev).
  */
-#define shost_for_each_device(sdev, shost) \
-   for ((sdev) = __scsi_iterate_devices((shost), NULL); \
-(sdev); \
-(sdev) = __scsi_iterate_devices((shost), (sdev)))
+#define shost_for_each_device(sdev, shost) \
+   for ((sdev) = NULL; ((sdev) = __scsi_iterate_devices((shost), (sdev), \
+scsi_device_get, scsi_device_put)) != NULL; )
 
 /**
  * __shost_for_each_device - iterate over all devices of a host (UNLOCKED)
-- 
2.12.0



Re: [PATCH] enclosure: fix sysfs symlinks creation when using multipath

2017-03-16 Thread James Bottomley
On Wed, 2017-03-15 at 19:39 -0400, Martin K. Petersen wrote:
> Maurizio Lombardi  writes:
> 
> > With multipath, it may happen that the same device is passed to
> > enclosure_add_device() multiple times and that the
> > enclosure_add_links() function fails to create the symlinks because
> > the device's sysfs directory entry is still NULL.  In this case,
> > the
> > links will never be created because all the subsequent calls to
> > enclosure_add_device() will immediately fail with EEXIST.
> 
> James?

Well I don't think the patch is the correct way to do this.  The
problem is that if we encounter an error creating the links, we
shouldn't add the device to the enclosure.  There's no need of a
links_created variable (see below).

However, more interesting is why the link creation failed in the first
place.  The device clearly seems to exist because it was added to sysfs
at time index 19.2 and the enclosure didn't try to use it until 60.0. 
 Can you debug this a bit more, please?  I can't see anything specific
to multipath in the trace, so whatever this is looks like it could
happen in the single path case as well.

James

---

diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
index 65fed71..ae89082 100644
--- a/drivers/misc/enclosure.c
+++ b/drivers/misc/enclosure.c
@@ -375,6 +375,7 @@ int enclosure_add_device(struct enclosure_device *edev, int 
component,
 struct device *dev)
 {
struct enclosure_component *cdev;
+   int err;
 
if (!edev || component >= edev->components)
return -EINVAL;
@@ -384,12 +385,15 @@ int enclosure_add_device(struct enclosure_device *edev, 
int component,
if (cdev->dev == dev)
return -EEXIST;
 
-   if (cdev->dev)
+   if (cdev->dev) {
enclosure_remove_links(cdev);
-
-   put_device(cdev->dev);
-   cdev->dev = get_device(dev);
-   return enclosure_add_links(cdev);
+   put_device(cdev->dev);
+   cdev->dev = NULL;
+   }
+   err = enclosure_add_links(cdev);
+   if (!err)
+   cdev->dev = get_device(dev);
+   return err;
 }
 EXPORT_SYMBOL_GPL(enclosure_add_device);
 



Re: 4.10+ qla2xxx driver wont load for qla2xxx (ISP2532-based 8Gb) with BAR 3 error, work fine on 4.9

2017-03-16 Thread Laurence Oberman


- Original Message -
> From: "Laurence Oberman" 
> To: "Himanshu Madhani" 
> Cc: "Chad Dupuis" , "Linux SCSI List" 
> 
> Sent: Tuesday, March 14, 2017 11:49:51 PM
> Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx (ISP2532-based 8Gb) 
> with BAR 3 error, work fine on 4.9
> 
> 
> 
> - Original Message -
> > From: "Himanshu Madhani" 
> > To: "Laurence Oberman" 
> > Cc: "Chad Dupuis" , "Linux SCSI List"
> > 
> > Sent: Tuesday, March 14, 2017 8:32:25 PM
> > Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx (ISP2532-based
> > 8Gb) with BAR 3 error, work fine on 4.9
> > 
> > 
> > - Original Message -
> > > From: "Laurence Oberman" 
> > > To: "Himanshu Madhani" 
> > > Cc: "Chad Dupuis" , "Linux SCSI List"
> > > 
> > > Sent: Tuesday, March 14, 2017 8:02:32 PM
> > > Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx
> > > (ISP2532-based
> > > 8Gb) with BAR 3 error, work fine on 4.9
> > > 
> > > 
> > > 
> > > - Original Message -
> > > > From: "Himanshu Madhani" 
> > > > To: "Laurence Oberman" 
> > > > Cc: "Chad Dupuis" , "Linux SCSI List"
> > > > 
> > > > Sent: Tuesday, March 14, 2017 5:11:13 PM
> > > > Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx
> > > > (ISP2532-based
> > > > 8Gb) with BAR 3 error, work fine on 4.9
> > > > 
> > > > 
> > > > - Original Message -
> > > > > From: "Laurence Oberman" 
> > > > > To: "Himanshu Madhani" 
> > > > > Cc: "Chad Dupuis" , "Linux SCSI List"
> > > > > 
> > > > > Sent: Monday, March 13, 2017 9:06:38 PM
> > > > > Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx
> > > > > (ISP2532-based
> > > > > 8Gb) with BAR 3 error, work fine on 4.9
> > > > > 
> > > > > 
> > > > > 
> > > > > - Original Message -
> > > > > > From: "Laurence Oberman" 
> > > > > > To: "Himanshu Madhani" 
> > > > > > Cc: "Chad Dupuis" , "Linux SCSI
> > > > > > List"
> > > > > > 
> > > > > > Sent: Monday, March 13, 2017 12:54:12 PM
> > > > > > Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx
> > > > > > (ISP2532-based
> > > > > > 8Gb) with BAR 3 error, work fine on 4.9
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > - Original Message -
> > > > > > > From: "Himanshu Madhani" 
> > > > > > > To: "Laurence Oberman" , "Chad
> > > > > > > Dupuis"
> > > > > > > 
> > > > > > > Cc: "Linux SCSI List" 
> > > > > > > Sent: Monday, March 13, 2017 12:39:03 PM
> > > > > > > Subject: RE: 4.10+ qla2xxx  driver wont load for qla2xxx
> > > > > > > (ISP2532-based
> > > > > > > 8Gb) with BAR 3 error, work fine on 4.9
> > > > > > > 
> > > > > > > Hi Laurence,
> > > > > > > 
> > > > > > > > -Original Message-
> > > > > > > > From: Laurence Oberman [mailto:lober...@redhat.com]
> > > > > > > > Sent: Sunday, March 12, 2017 11:31 AM
> > > > > > > > To: Dupuis, Chad ; Madhani,
> > > > > > > > Himanshu
> > > > > > > > 
> > > > > > > > Cc: Linux SCSI List 
> > > > > > > > Subject: Re: 4.10+ qla2xxx driver wont load for qla2xxx
> > > > > > > > (ISP2532-based
> > > > > > > > 8Gb)
> > > > > > > > with BAR 3 error, work fine on 4.9
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > - Original Message -
> > > > > > > > > From: "Laurence Oberman" 
> > > > > > > > > To: "Chad Dupuis" , "Himanshu
> > > > > > > > > Madhani"
> > > > > > > > > 
> > > > > > > > > Cc: "Linux SCSI List" 
> > > > > > > > > Sent: Sunday, March 12, 2017 7:39:23 AM
> > > > > > > > > Subject: 4.10+ qla2xxx  driver wont load for qla2xxx
> > > > > > > > > (ISP2532-based
> > > > > > > > 

Re: [patch] check length passed to SG_NEXT_CMD_LEN

2017-03-16 Thread Douglas Gilbert

On 2017-03-15 07:38 PM, Martin K. Petersen wrote:

Peter Chang  writes:


now that i think i've got gmail not marking everything as spam...


Doug?


The extra sanity check can't hurt.

Acked-by: Douglas Gilbert 




From 93409c62db49d15105390315a685e54083029bee Mon Sep 17 00:00:00 2001

From: peter chang 
Date: Wed, 15 Feb 2017 14:11:54 -0800
Subject: [PATCH] [sg] check length passed to SG_NEXT_CMD_LEN

the user can control the size of the next command passed along, but
the value passed to the ioctl isn't checked against the usable
max command size.

Change-Id: I9ac2ae07c35cf5fda62d7afad32c8d9ab6a9ea1d
Tested: sanity checked w/ calling the ioctl w/ a bogus size
---
 drivers/scsi/sg.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 9c5c5f2b3962..b47a369cb71c 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -976,6 +976,8 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned 
long arg)
result = get_user(val, ip);
if (result)
return result;
+   if (val > SG_MAX_CDB_SIZE)
+   return -ENOMEM;
sfp->next_cmd_len = (val > 0) ? val : 0;
return 0;
case SG_GET_VERSION_NUM:





Re: [PATCHv2 3/5] target/user: Fix possible overwrite of t_data_sg's last iov[]

2017-03-16 Thread Bryant G. Ly



On 3/8/17 2:45 AM, lixi...@cmss.chinamobile.com wrote:

From: Xiubo Li 

If there has BIDI data, its first iov[] will overwrite the last
iov[] for se_cmd->t_data_sg.

To fix this, we can just increase the iov pointer, but this may
introuduce a new memory leakage bug: If the se_cmd->data_length
and se_cmd->t_bidi_data_sg->length are all not aligned up to the
DATA_BLOCK_SIZE, the actual length needed maybe larger than just
sum of them.

So, this could be avoided by rounding all the data lengthes up
to DATA_BLOCK_SIZE.

Signed-off-by: Xiubo Li 
---
  drivers/target/target_core_user.c | 32 +++-
  1 file changed, 19 insertions(+), 13 deletions(-)


I have seen this in my environment:
(gdb) print *((tcmulib_cmd->iovec)+0)
$7 = {iov_base = 0x3fff7c3d, iov_len = 8192}
(gdb) print *((tcmulib_cmd->iovec)+1)
$3 = {iov_base = 0x3fff7c3da000, iov_len = 4096}
(gdb) print *((tcmulib_cmd->iovec)+2)
$4 = {iov_base = 0x3fff7c3dc000, iov_len = 16384}
(gdb) print *((tcmulib_cmd->iovec)+3)
$5 = {iov_base = 0x3fff7c3f7000, iov_len = 12288}
(gdb) print *((tcmulib_cmd->iovec)+4)
$6 = {iov_base = 0x1306e853c0028, iov_len = 128}  <--- bad pointer and length

So this fix would be great!

Signed-off-by: Bryant G. Ly 




RE: [PATCH 08/29] drivers, md: convert mddev.active from atomic_t to refcount_t

2017-03-16 Thread Reshetova, Elena
> On Tue, 2017-03-14 at 12:29 +, Reshetova, Elena wrote:
> > > Elena Reshetova  writes:
> > >
> > > > refcount_t type and corresponding API should be
> > > > used instead of atomic_t when the variable is used as
> > > > a reference counter. This allows to avoid accidental
> > > > refcounter overflows that might lead to use-after-free
> > > > situations.
> > > >
> > > > Signed-off-by: Elena Reshetova 
> > > > Signed-off-by: Hans Liljestrand 
> > > > Signed-off-by: Kees Cook 
> > > > Signed-off-by: David Windsor 
> > > > ---
> > > >  drivers/md/md.c | 6 +++---
> > > >  drivers/md/md.h | 3 ++-
> > > >  2 files changed, 5 insertions(+), 4 deletions(-)
> > >
> > > When booting linux-next (specifically 5be4921c9958ec) I'm seeing
> > > the
> > > backtrace below. I suspect this patch is just exposing an existing
> > > issue?
> >
> > Yes, we have actually been following this issue in the another
> > thread.
> > It looks like the object is re-used somehow, but I can't quite
> > understand how just by reading the code.
> > This was what I put into the previous thread:
> >
> > "The log below indicates that you are using your refcounter in a bit
> > weird way in mddev_find().
> > However, I can't find the place (just by reading the code) where you
> > would increment refcounter from zero (vs. setting it to one).
> > It looks like you either iterate over existing nodes (and increment
> > their counters, which should be >= 1 at the time of increment) or
> > create a new node, but then mddev_init() sets the counter to 1. "
> >
> > If you can help to understand what is going on with the object
> > creation/destruction, would be appreciated!
> >
> > Also Shaohua Li stopped this patch coming from his tree since the
> > issue was caught at that time, so we are not going to merge this
> > until we figure it out.
> 
> Asking on the correct list (dm-devel) would have got you the easy
> answer:  The refcount behind mddev->active is a genuine atomic.  It has
> refcount properties but only if the array fails to initialise (in that
> case, final put kills it).  Once it's added to the system as a gendisk,
> it cannot be freed until md_free().  Thus its ->active count can go to
> zero (when it becomes inactive; usually because of an unmount). On a
> simple allocation regardless of outcome, the last executed statement in
> md_alloc is mddev_put(): that destroys the device if we didn't manage
> to create it or returns 0 and adds an inactive device to the system
> which the user can get with mddev_find().

Thank you James for explaining this! I guess in this case, the conversion 
doesn't make sense. 
And sorry about not asking in a correct place: we are handling many similar 
patches now and while I try to reach the right audience using get_maintainer 
script, it doesn't always succeeds. 

Best Regards,
Elena.

> 
> James
> 



[PATCH] sd: Ignore sync cache failures when not supported

2017-03-16 Thread Thierry Escande
From: Derek Basehore 

Some external hard drives don't support the sync command even though the
hard drive has write cache enabled. In this case, upon suspend request,
sync cache failures are ignored if the error code in the sense header is
ILLEGAL_REQUEST. There's not much we can do for these drives, so we
shouldn't fail to suspend for this error case. The drive may stay
powered if that's the setup for the port it's plugged into.

Signed-off-by: Derek Basehore 
Signed-off-by: Thierry Escande 
---
 drivers/scsi/sd.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index b193304..c0388ff 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1492,7 +1492,7 @@ static unsigned int sd_check_events(struct gendisk *disk, 
unsigned int clearing)
return retval;
 }
 
-static int sd_sync_cache(struct scsi_disk *sdkp)
+static int sd_sync_cache(struct scsi_disk *sdkp, int *sense_key)
 {
int retries, res;
struct scsi_device *sdp = sdkp->device;
@@ -1521,8 +1521,11 @@ static int sd_sync_cache(struct scsi_disk *sdkp)
if (res) {
sd_print_result(sdkp, "Synchronize Cache(10) failed", res);
 
-   if (driver_byte(res) & DRIVER_SENSE)
+   if (driver_byte(res) & DRIVER_SENSE) {
sd_print_sense_hdr(sdkp, );
+   if (sense_key)
+   *sense_key = sshdr.sense_key;
+   }
/* we need to evaluate the error return  */
if (scsi_sense_valid() &&
(sshdr.asc == 0x3a ||   /* medium not present */
@@ -3304,7 +3307,7 @@ static void sd_shutdown(struct device *dev)
 
if (sdkp->WCE && sdkp->media_present) {
sd_printk(KERN_NOTICE, sdkp, "Synchronizing SCSI cache\n");
-   sd_sync_cache(sdkp);
+   sd_sync_cache(sdkp, NULL);
}
 
if (system_state != SYSTEM_RESTART && sdkp->device->manage_start_stop) {
@@ -3316,6 +3319,7 @@ static void sd_shutdown(struct device *dev)
 static int sd_suspend_common(struct device *dev, bool ignore_stop_errors)
 {
struct scsi_disk *sdkp = dev_get_drvdata(dev);
+   int sense_key = NO_SENSE;
int ret = 0;
 
if (!sdkp)  /* E.g.: runtime suspend following sd_remove() */
@@ -3323,8 +3327,17 @@ static int sd_suspend_common(struct device *dev, bool 
ignore_stop_errors)
 
if (sdkp->WCE && sdkp->media_present) {
sd_printk(KERN_NOTICE, sdkp, "Synchronizing SCSI cache\n");
-   ret = sd_sync_cache(sdkp);
+   ret = sd_sync_cache(sdkp, _key);
if (ret) {
+   /*
+* If this drive doesn't support sync, there's not much
+* to do and suspend shouldn't fail.
+*/
+   if (sense_key == ILLEGAL_REQUEST) {
+   ret = 0;
+   goto start_stop;
+   }
+
/* ignore OFFLINE device */
if (ret == -ENODEV)
ret = 0;
@@ -3332,6 +3345,7 @@ static int sd_suspend_common(struct device *dev, bool 
ignore_stop_errors)
}
}
 
+start_stop:
if (sdkp->device->manage_start_stop) {
sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
/* an error is not worth aborting a system sleep */
-- 
2.7.4



[Bug 194837] VM with virtio-scsi drive often crashes during boot with kernel 4.11rc1

2017-03-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=194837

--- Comment #14 from Thorsten Leemhuis (regressi...@leemhuis.info) ---
Status update: Had a hunch and reverted the vhost/virtio changes
(https://git.kernel.org/torvalds/c/54d7989f476ca57fc3c5cc71524c480ccb74c481 )
on top of master and now it's running fine. Will look closer now which of the
changes is the culprit.

-- 
You are receiving this mail because:
You are the assignee for the bug.


Re: [PATCH v2] scsi_sysfs: fix hang when removing scsi device

2017-03-16 Thread Israel Rukshin

On 3/16/2017 5:42 PM, Bart Van Assche wrote:

On Thu, 2017-03-16 at 11:02 +0200, Israel Rukshin wrote:

I tested your patches and the hang disappeared when fast_io_fail_tmo
expired. The warning from patch 1 still exist, so we need an additional
fix for that.

Hello Israel,

Thanks for the testing! I will leave out patch 1 for now since it is not
needed to fix the hang.

I assume that your e-mail counts as a Tested-by?

PS: please do not top-post when replying. This is considered annoying in the
Linux kernel community.

Bart.


Hello Bart,

Yes, you can add me as Tested-by.

Israel.


Re: [PATCH v2] scsi_sysfs: fix hang when removing scsi device

2017-03-16 Thread Bart Van Assche
On Thu, 2017-03-16 at 11:02 +0200, Israel Rukshin wrote:
> I tested your patches and the hang disappeared when fast_io_fail_tmo 
> expired. The warning from patch 1 still exist, so we need an additional
> fix for that.

Hello Israel,

Thanks for the testing! I will leave out patch 1 for now since it is not
needed to fix the hang.

I assume that your e-mail counts as a Tested-by?

PS: please do not top-post when replying. This is considered annoying in the
Linux kernel community.

Bart.

[PATCH] scsi: libsas: fix ata xfer length

2017-03-16 Thread John Garry
The total ata xfer length may not be calculated properly,
in that we do not use the proper method to get an sg element
dma length.

According to the code comment, sg_dma_len() should be used
after dma_map_sg() is called.

This issue was found by turning on the SMMUv3 in front of
the hisi_sas controller in hip07. Multiple sg elements
were being combined into a single element, but the original
first element length was being use as the total xfer length.

Signed-off-by: John Garry 

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 763f012..87f5e694 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -221,7 +221,7 @@ static unsigned int sas_ata_qc_issue(struct ata_queued_cmd 
*qc)
task->num_scatter = qc->n_elem;
} else {
for_each_sg(qc->sg, sg, qc->n_elem, si)
-   xfer += sg->length;
+   xfer += sg_dma_len(sg);
 
task->total_xfer_len = xfer;
task->num_scatter = si;
-- 
1.9.1



Re: [PATCHv3 6/6] scsi: make asynchronous aborts mandatory

2017-03-16 Thread Hannes Reinecke
On 03/15/2017 06:55 PM, Benjamin Block wrote:
> On Wed, Mar 15, 2017 at 02:54:09PM +0100, Hannes Reinecke wrote:
>> On 03/14/2017 06:33 PM, Benjamin Block wrote:
>>> Hello Hannes,
>>>
>>> On Wed, Mar 01, 2017 at 10:15:20AM +0100, Hannes Reinecke wrote:
 There hasn't been any reports for HBAs where asynchronous abort
 would not work, so we should make it mandatory and remove
 the fallback.

 Signed-off-by: Hannes Reinecke 
 Reviewed-by: Johannes Thumshirn 
 Reviewed-by: Bart Van Assche 
 ---
  Documentation/scsi/scsi_eh.txt | 18 --
  drivers/scsi/scsi_error.c  | 81 
 --
  drivers/scsi/scsi_lib.c|  2 +-
  drivers/scsi/scsi_priv.h   |  3 +-
  include/scsi/scsi_host.h   |  5 ---
  5 files changed, 15 insertions(+), 94 deletions(-)

 diff --git a/Documentation/scsi/scsi_eh.txt 
 b/Documentation/scsi/scsi_eh.txt
 index 4edb9c1c..11e447b 100644
 --- a/Documentation/scsi/scsi_eh.txt
 +++ b/Documentation/scsi/scsi_eh.txt
 @@ -70,7 +70,7 @@ with the command.
scmd is requeued to blk queue.

   - otherwise
 -  scsi_eh_scmd_add(scmd, 0) is invoked for the command.  See
 +  scsi_eh_scmd_add(scmd) is invoked for the command.  See
[1-3] for details of this function.


 @@ -103,9 +103,7 @@ function
  eh_timed_out() callback did not handle the command.
Step #2 is taken.

 - 2. If the host supports asynchronous completion (as indicated by the
 -no_async_abort setting in the host template) scsi_abort_command()
 -is invoked to schedule an asynchrous abort.
 + 2. scsi_abort_command() is invoked to schedule an asynchrous abort.
  Asynchronous abort are not invoked for commands which the
  SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
  already had been aborted once, and this is a retry which failed),
 @@ -127,16 +125,13 @@ function

   scmds enter EH via scsi_eh_scmd_add(), which does the following.

 - 1. Turns on scmd->eh_eflags as requested.  It's 0 for error
 -completions and SCSI_EH_CANCEL_CMD for timeouts.
 + 1. Links scmd->eh_entry to shost->eh_cmd_q

 - 2. Links scmd->eh_entry to shost->eh_cmd_q
 + 2. Sets SHOST_RECOVERY bit in shost->shost_state

 - 3. Sets SHOST_RECOVERY bit in shost->shost_state
 + 3. Increments shost->host_failed

 - 4. Increments shost->host_failed
 -
 - 5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
 + 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed

   As can be seen above, once any scmd is added to shost->eh_cmd_q,
  SHOST_RECOVERY shost_state bit is turned on.  This prevents any new
 @@ -252,7 +247,6 @@ scmd->allowed.

   1. Error completion / time out
  ACTION: scsi_eh_scmd_add() is invoked for scmd
 -  - set scmd->eh_eflags
- add scmd to shost->eh_cmd_q
- set SHOST_RECOVERY
- shost->host_failed++
 diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
 index 802a081..7b70ee9 100644
 --- a/drivers/scsi/scsi_error.c
 +++ b/drivers/scsi/scsi_error.c
 @@ -163,7 +163,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
 *shost)
}
}

 -  scsi_eh_scmd_add(scmd, 0);
 +  scsi_eh_scmd_add(scmd);
  }

  /**
 @@ -217,9 +217,8 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
 *shost)
  /**
   * scsi_eh_scmd_add - add scsi cmd to error handling.
   * @scmd: scmd to run eh on.
 - * @eh_flag:  optional SCSI_EH flag.
   */
 -void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
 +void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
  {
struct Scsi_Host *shost = scmd->device->host;
unsigned long flags;
 @@ -235,9 +234,6 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int 
 eh_flag)
if (shost->eh_deadline != -1 && !shost->last_reset)
shost->last_reset = jiffies;

 -  if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
 -  eh_flag &= ~SCSI_EH_CANCEL_CMD;
 -  scmd->eh_eflags |= eh_flag;
scsi_eh_reset(scmd);
list_add_tail(>eh_entry, >eh_cmd_q);
shost->host_failed++;
 @@ -271,10 +267,9 @@ enum blk_eh_timer_return scsi_times_out(struct 
 request *req)
rtn = host->hostt->eh_timed_out(scmd);

if (rtn == BLK_EH_NOT_HANDLED) {
 -  if (host->hostt->no_async_abort ||
 -  scsi_abort_command(scmd) != SUCCESS) {
 +  if (scsi_abort_command(scmd) != SUCCESS) {
set_host_byte(scmd, DID_TIME_OUT);
 -  scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
 + 

[Bug 194837] VM with virtio-scsi drive often crashes during boot with kernel 4.11rc1

2017-03-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=194837

--- Comment #13 from Thorsten Leemhuis (li...@leemhuis.info) ---
(In reply to Omar Sandoval from comment #11)
>
> If you are still seeing crashes on -rc2, could you please report it to
> linux-bl...@vger.kernel.org?

FWIW, these are the crashes I see:
https://plus.google.com/+ThorstenLeemhuis/posts/FjyyGjNtrrG But those happen
without virtio-scsi as well.

-- 
You are receiving this mail because:
You are the assignee for the bug.


Re: [PATCHv3 4/6] scsi_error: do not escalate failed EH command

2017-03-16 Thread Hannes Reinecke
On 03/16/2017 12:01 PM, Benjamin Block wrote:
> On Wed, Mar 15, 2017 at 02:54:16PM +0100, Hannes Reinecke wrote:
>> On 03/14/2017 06:56 PM, Benjamin Block wrote:
>>> Hello Hannes,
>>>
>>> On Wed, Mar 01, 2017 at 10:15:18AM +0100, Hannes Reinecke wrote:
 When a command is sent as part of the error handling there
 is not point whatsoever to start EH escalation when that
 command fails; we are _already_ in the error handler,
 and the escalation is about to commence anyway.
 So just call 'scsi_try_to_abort_cmd()' to abort outstanding
 commands and let the main EH routine handle the rest.

 Signed-off-by: Hannes Reinecke 
 Reviewed-by: Johannes Thumshirn 
 Reviewed-by: Bart Van Assche 
 ---
  drivers/scsi/scsi_error.c | 11 +--
  1 file changed, 1 insertion(+), 10 deletions(-)

 diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
 index e1ca3b8..4613aa1 100644
 --- a/drivers/scsi/scsi_error.c
 +++ b/drivers/scsi/scsi_error.c
 @@ -889,15 +889,6 @@ static int scsi_try_to_abort_cmd(struct 
 scsi_host_template *hostt,
return hostt->eh_abort_handler(scmd);
  }

 -static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
 -{
 -  if (scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd) != SUCCESS)
 -  if (scsi_try_bus_device_reset(scmd) != SUCCESS)
 -  if (scsi_try_target_reset(scmd) != SUCCESS)
 -  if (scsi_try_bus_reset(scmd) != SUCCESS)
 -  scsi_try_host_reset(scmd);
 -}
 -
  /**
   * scsi_eh_prep_cmnd  - Save a scsi command info as part of error recovery
   * @scmd:   SCSI command structure to hijack
 @@ -1082,7 +1073,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
 unsigned char *cmnd,
break;
}
} else if (rtn != FAILED) {
 -  scsi_abort_eh_cmnd(scmd);
 +  scsi_try_to_abort_cmd(shost->hostt, scmd);
rtn = FAILED;
}
>>>
>>> The idea is sound, but this implementation would cause "use-after-free"s.
>>>
>>> I only know our own LLD well enough to judge, but with zFCP there will
>>> always be a chance that an abort fails - be it memory pressure,
>>> hardware/firmware behavior or internal EH in zFCP.
>>>
>>> Calling queuecommand() will mean for us in the LLD, that we allocate a
>>> unique internal request struct for the scsi_cmnd (struct
>>> zfcp_fsf_request) and add that to our internal hash-table with
>>> outstanding commands. We assume this scsi_cmnd-pointer is ours till we
>>> complete it via scsi_done are yield it via successful EH-actions.
>>>
>>> In case the abort fails, you fail to take back the ownership over the
>>> scsi command. Which in turn means possible "use-after-free"s when we
>>> still thinks the scsi command is ours, but EH has already overwritten
>>> the scsi-command with the original one. When we still get an answer or
>>> otherwise use the scsi_cmnd-pointer we would access an invalid one.
>>>
>> That is actually not try.
>> As soon as we're calling 'scsi_try_to_abort_command()' ownership is
>> assumed to reside in the SCSI midlayer;
> 
> That can not be true. First of all, look at the function itself (v4.10):
> 
>   static int scsi_try_to_abort_cmd...
>   {
>   if (!hostt->eh_abort_handler)
>   return FAILED;
> 
>   return hostt->eh_abort_handler(scmd);
>   }
> 
> If what you say is true, then this whole API of LLDs providing or
> choosing not to provide implementations for these function would be
> fundamentally broken.
> The function itself returns FAILED when there is no such function.. how
> is a LLD that does not implement it ever to know that you took ownership
> by calling scsi_try_to_abort_cmd()?
> 
Well. Ok.

_Actually_, the whole concept of 'ownership' in SCSI EH is a bit flaky.

There are two ways of entering the error handler:
- SCSI command timeout
- Failing to evaluate the SCSI command status

For the latter case ownership already _is_ with the SCSI midlayer, as
the LLDD called 'scsi_done' and with that moved ownership to the midlayer.

The interesting part is command timeout.
Once a command timeout triggers the block layer is calling
'blk_mark_rq_complete' to avoid any concurrent completions.
IE any calls to scsi_done() will be short-circuited with that,
effectively transferring ownership to SCSI midlayer.

Now the SCSI midlayer has to inform the LLDD that it has taken
ownership; for that it calls the various eh_XXX callbacks into the LLDD.
While it's quite clear that SUCCESS signals a transfer of ownership to
SCSI ML, it's less clear what happens in the case of FAILED.
Problem here is that the eh_XXX callbacks actually serve a dual purpose;
one it to signal the transfer of ownership to SCSI ML and the other is

Re: [PATCHv3 4/6] scsi_error: do not escalate failed EH command

2017-03-16 Thread Benjamin Block
On Wed, Mar 15, 2017 at 02:54:16PM +0100, Hannes Reinecke wrote:
> On 03/14/2017 06:56 PM, Benjamin Block wrote:
> > Hello Hannes,
> >
> > On Wed, Mar 01, 2017 at 10:15:18AM +0100, Hannes Reinecke wrote:
> >> When a command is sent as part of the error handling there
> >> is not point whatsoever to start EH escalation when that
> >> command fails; we are _already_ in the error handler,
> >> and the escalation is about to commence anyway.
> >> So just call 'scsi_try_to_abort_cmd()' to abort outstanding
> >> commands and let the main EH routine handle the rest.
> >>
> >> Signed-off-by: Hannes Reinecke 
> >> Reviewed-by: Johannes Thumshirn 
> >> Reviewed-by: Bart Van Assche 
> >> ---
> >>  drivers/scsi/scsi_error.c | 11 +--
> >>  1 file changed, 1 insertion(+), 10 deletions(-)
> >>
> >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> >> index e1ca3b8..4613aa1 100644
> >> --- a/drivers/scsi/scsi_error.c
> >> +++ b/drivers/scsi/scsi_error.c
> >> @@ -889,15 +889,6 @@ static int scsi_try_to_abort_cmd(struct 
> >> scsi_host_template *hostt,
> >>return hostt->eh_abort_handler(scmd);
> >>  }
> >>
> >> -static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
> >> -{
> >> -  if (scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd) != SUCCESS)
> >> -  if (scsi_try_bus_device_reset(scmd) != SUCCESS)
> >> -  if (scsi_try_target_reset(scmd) != SUCCESS)
> >> -  if (scsi_try_bus_reset(scmd) != SUCCESS)
> >> -  scsi_try_host_reset(scmd);
> >> -}
> >> -
> >>  /**
> >>   * scsi_eh_prep_cmnd  - Save a scsi command info as part of error recovery
> >>   * @scmd:   SCSI command structure to hijack
> >> @@ -1082,7 +1073,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
> >> unsigned char *cmnd,
> >>break;
> >>}
> >>} else if (rtn != FAILED) {
> >> -  scsi_abort_eh_cmnd(scmd);
> >> +  scsi_try_to_abort_cmd(shost->hostt, scmd);
> >>rtn = FAILED;
> >>}
> >
> > The idea is sound, but this implementation would cause "use-after-free"s.
> >
> > I only know our own LLD well enough to judge, but with zFCP there will
> > always be a chance that an abort fails - be it memory pressure,
> > hardware/firmware behavior or internal EH in zFCP.
> >
> > Calling queuecommand() will mean for us in the LLD, that we allocate a
> > unique internal request struct for the scsi_cmnd (struct
> > zfcp_fsf_request) and add that to our internal hash-table with
> > outstanding commands. We assume this scsi_cmnd-pointer is ours till we
> > complete it via scsi_done are yield it via successful EH-actions.
> >
> > In case the abort fails, you fail to take back the ownership over the
> > scsi command. Which in turn means possible "use-after-free"s when we
> > still thinks the scsi command is ours, but EH has already overwritten
> > the scsi-command with the original one. When we still get an answer or
> > otherwise use the scsi_cmnd-pointer we would access an invalid one.
> >
> That is actually not try.
> As soon as we're calling 'scsi_try_to_abort_command()' ownership is
> assumed to reside in the SCSI midlayer;

That can not be true. First of all, look at the function itself (v4.10):

static int scsi_try_to_abort_cmd...
{
if (!hostt->eh_abort_handler)
return FAILED;

return hostt->eh_abort_handler(scmd);
}

If what you say is true, then this whole API of LLDs providing or
choosing not to provide implementations for these function would be
fundamentally broken.
The function itself returns FAILED when there is no such function.. how
is a LLD that does not implement it ever to know that you took ownership
by calling scsi_try_to_abort_cmd()?

Then look at the function-comment:

/**
 * scsi_try_to_abort_cmd - ...
 * ...
 * Notes:
 *SUCCESS does not necessarily indicate that the command
 *has been aborted; it only indicates that the LLDDs
 *has cleared all references to that command.
 *LLDDs should return FAILED only if an abort was required
 *but could not be executed. LLDDs should return FAST_IO_FAIL
 *if the device is temporarily unavailable (eg due to a
 *link down on FibreChannel)
 */

While not written directly, it says that SUCCESS means the references are
cleared, ownership transferred.

Then look at the scsi_eh.txt:

3. If !list_empty(_work_q), invoke scsi_eh_abort_cmds().

<>

This action is taken for each timed out command when
no_async_abort is enabled in the host template.
hostt->eh_abort_handler() is invoked for each scmd.  The
handler returns SUCCESS if it has succeeded to make LLDD and
all related hardware forget about the 

Dear Friend,

2017-03-16 Thread Mr.Ghazi Ahmed


I have a business proposal in the tune of $10.2m USD for you to handle with me. 
I have opportunity to transfer this abandon fund to your bank account in your 
country which belongs to our client.

I am inviting you in this transaction where this money can be shared between us 
at ratio of 60/40% and help the needy around us don’t be afraid of anything I 
am with you I will instruct you what you will do to maintain this fund.

Please kindly contact me with your information's if you are interested in this 
tranasction for more details.

1. Your Full Name.
2. Your Address..
3. Your Country of Origin.
4. What do you do for living ...
5. Your Age..
6. Gender.
7. Your ID card copy and telephone number for easy communication...

Mr.Ghazi Ahmed


Re: [PATCH v2] scsi_sysfs: fix hang when removing scsi device

2017-03-16 Thread Israel Rukshin

Hi Bart,

I tested your patches and the hang disappeared when fast_io_fail_tmo 
expired.

The warning from patch 1 still exist, so we need an additional fix for that.

Regards,
Israel

On 3/16/2017 1:27 AM, Bart Van Assche wrote:

On Tue, 2017-03-14 at 16:23 +0200, Israel Rukshin wrote:

Patch number 5 doesn't handle the case when device_for_each_child() is
called. device_for_each_child() calls to target_unblock() that uses also
starget_for_each_device(). After applying also the following change the
hang disappeared but it didn't fix the warning. Those fixes are not enough
because if fast_io_fail_tmo is set to infinity then the hang will remain,
because only __rport_fail_io_fast() calls to scsi_target_unblock() and
terminate_rport_io() that free the sync cache command.

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index e5d4b50..09f9566 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3068,9 +3068,15 @@ void scsi_device_resume(struct scsi_device *sdev)
   static int
   target_unblock(struct device *dev, void *data)
   {
-   if (scsi_is_target_device(dev))
-   starget_for_each_device(to_scsi_target(dev), data,
-   device_unblock);
+   if (scsi_is_target_device(dev)) {
+   struct scsi_target *starget = to_scsi_target(dev);
+   struct Scsi_Host *shost = dev_to_shost(dev->parent);
+   unsigned long flags;
+
+   spin_lock_irqsave(shost->host_lock, flags);
+   __starget_for_each_device(starget, data, device_unblock);
+   spin_unlock_irqrestore(shost->host_lock, flags);
+   }
  return 0;
   }

Hello Israel,

Regarding setting fast_io_fail_tmo to infinity: that does not prevent
kernel module unloading because srp_timed_out() stops resetting the
timer as soon as the SCSI device is unblocked. The above patch should
realize that but suffers from the same issue as a patch attached to
my previous e-mail, namely lock inversion. For at least the following
call chain the block layer queue lock is the outer lock and the SCSI
host lock is the inner lock:
ata_qc_schedule_eh()
-> blk_abort_request()
   -> blk_mq_rq_timed_out()
 -> scsi_timeout()
   -> scsi_times_out()
 -> scsi_eh_scmd_add()

So I think we should avoid introducing code with the SCSI host lock
as outer lock and the block layer queue lock as inner lock. How about
the attached four patches?

Thanks,

Bart.




[PATCH] scsi: storvsc: Prefer kcalloc over kzalloc with multiply

2017-03-16 Thread Miguel Bernal Marin
Use kcalloc for allocating an array instead of kzalloc with multiply,
kcalloc is the preferred API.

Found with checkpatch.

Signed-off-by: Miguel Bernal Marin 
---
 drivers/scsi/storvsc_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 638e5f427c90..3d70d1cf49a3 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -866,7 +866,7 @@ static int storvsc_channel_init(struct hv_device *device, 
bool is_fc)
 * We will however populate all the slots to evenly distribute
 * the load.
 */
-   stor_device->stor_chns = kzalloc(sizeof(void *) * num_possible_cpus(),
+   stor_device->stor_chns = kcalloc(num_possible_cpus(), sizeof(void *),
 GFP_KERNEL);
if (stor_device->stor_chns == NULL)
return -ENOMEM;
-- 
2.12.0



[PATCH] scsi: storvsc: remove return at end of void function

2017-03-16 Thread Miguel Bernal Marin
storvsc_on_channel_callback is a void function and the return
statement at the end is not useful.

Found with checkpatch.

Signed-off-by: Miguel Bernal Marin 
---
 drivers/scsi/storvsc_drv.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 3d70d1cf49a3..538f3e131275 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1191,8 +1191,6 @@ static void storvsc_on_channel_callback(void *context)
break;
}
} while (1);
-
-   return;
 }
 
 static int storvsc_connect_to_vsp(struct hv_device *device, u32 ring_size,
-- 
2.12.0