Re: [PATCH 1/1] scsi: storvsc: Avoid allocating memory for temp cpumasks

2018-05-17 Thread Stephen Hemminger
On Thu, 17 May 2018 14:07:40 -0700
Michael Kelley <mhkelle...@gmail.com> wrote:

> Current code allocates 240 Kbytes (in typical configs) for
> each synthetic SCSI controller to use as temp cpumask variables.
> Recode to avoid needing the temp cpumask variables and remove the
> memory allocation.
> 
> Signed-off-by: Michael Kelley <mikel...@microsoft.com>

Acked-by: Stephen Hemminger <sthem...@microsoft.com>


Re: [Patch v2] Storvsc: Select channel based on available percentage of ring buffer to write

2018-04-20 Thread Stephen Hemminger
On Thu, 19 Apr 2018 14:54:24 -0700
Long Li <lon...@linuxonhyperv.com> wrote:

> From: Long Li <lon...@microsoft.com>
> 
> This is a best effort for estimating on how busy the ring buffer is for
> that channel, based on available buffer to write in percentage. It is still
> possible that at the time of actual ring buffer write, the space may not be
> available due to other processes may be writing at the time.
> 
> Selecting a channel based on how full it is can reduce the possibility that
> a ring buffer write will fail, and avoid the situation a channel is over
> busy.
> 
> Now it's possible that storvsc can use a smaller ring buffer size
> (e.g. 40k bytes) to take advantage of cache locality.
> 
> Changes.
> v2: Pre-allocate struct cpumask on the heap.
> Struct cpumask is a big structure (1k bytes) when CONFIG_NR_CPUS=8192 (default
> value when CONFIG_MAXSMP=y). Don't use kernel stack for it by pre-allocating
> them using kmalloc when channels are first initialized.
> 
> Signed-off-by: Long Li <lon...@microsoft.com>

Reviewed-by: Stephen Hemminger <sthem...@microsoft.com>


RE: [Resend Patch 1/3] Vmbus: Add function to report available ring buffer to write in total ring size percentage

2018-03-28 Thread Stephen Hemminger
Maybe best through greg's char-misc tree since it has generic hv code and 
sometime updates both network and scsi.

Alternatively,  put common code through one tree, and hold off the network 
device change till next release.

-Original Message-
From: Long Li 
Sent: Wednesday, March 28, 2018 3:43 PM
To: Martin K. Petersen <martin.peter...@oracle.com>; Long Li 
<lon...@linuxonhyperv.com>
Cc: KY Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>; 
Stephen Hemminger <sthem...@microsoft.com>; James E . J . Bottomley 
<jbottom...@odin.com>; de...@linuxdriverproject.org; 
linux-scsi@vger.kernel.org; linux-ker...@vger.kernel.org; net...@vger.kernel.org
Subject: RE: [Resend Patch 1/3] Vmbus: Add function to report available ring 
buffer to write in total ring size percentage

> Subject: Re: [Resend Patch 1/3] Vmbus: Add function to report available ring
> buffer to write in total ring size percentage
> 
> 
> Long,
> 
> > Netvsc has a function to calculate how much ring buffer in percentage
> > is available to write. This function is also useful for storvsc and
> > other vmbus devices.
> 
> What is the submission strategy for this series? Do you expect it to go
> through net or scsi? If the latter, I'll need an ack from davem.

Martin,

I hope this patch set goes through SCSI, because it's purpose is to improve 
storvsc.

If this strategy is not possible, I can resubmit the 1st two patches to net, 
and the 3rd patch to scsi after the 1st two are merged.

Long

> 
> --
> Martin K. PetersenOracle Linux Engineering


Re: [PATCH] storvsc: Set up correct queue depth values for IDE devices

2018-03-15 Thread Stephen Hemminger
On Thu, 15 Mar 2018 16:52:23 -0700
Long Li <lon...@exchange.microsoft.com> wrote:

> From: Long Li <lon...@microsoft.com>
> 
> Unlike SCSI and FC, we don't use multiple channels for IDE. So set queue depth
> correctly for IDE.
> 
> Also set the correct cmd_per_lun for all devices.
> 
> Signed-off-by: Long Li <lon...@microsoft.com>

Looks correct.
Acked-by: Stephen Hemminger <sthem...@microsoft.com>


Re: [PATCH] scsi: storvsc: Fix scsi_cmd error assignments in storvsc_handle_error

2017-12-19 Thread Stephen Hemminger
On Tue, 19 Dec 2017 13:32:48 -0500
Cathy Avery <cav...@redhat.com> wrote:

> When an I/O is returned with an srb_status of SRB_STATUS_INVALID_LUN
> which has zero good_bytes it must be assigned an error. Otherwise
> the I/O will be continuously requeued and will cause a deadlock in the
> case where disks are being hot added and removed. sd_probe_async will
> wait forever for its I/O to complete while holding scsi_sd_probe_domain.
> 
> Also returning the default error of DID_TARGET_FAILURE causes
> multipath to not retry the I/O resulting in applications receiving I/O
> errors before a failover can occur.
> 
> Signed-off-by: Cathy Avery <cav...@redhat.com>
> Signed-off-by: Long Li <lon...@microsoft.com>

When working on the DVD probe issue I saw that error handling was
problematic. Thanks for fixing.

Reviewed-by: Stephen Hemminger <step...@networkplumber.org>


Re: [PATCH] storvsc: fix memory leak on ring buffer busy

2017-08-30 Thread Stephen Hemminger
On Tue, 29 Aug 2017 21:31:11 -0400
"Martin K. Petersen"  wrote:

> Long,
> 
> > When storvsc is sending I/O to Hyper-v, it may allocate a bigger
> > buffer descriptor for large data payload that can't fit into a
> > pre-allocated buffer descriptor. This bigger buffer is freed on return
> > path.
> >
> > If I/O request to Hyper-v fails due to ring buffer busy, the storvsc
> > allocated buffer descriptor should also be freed.  
> 
> Which kernel version is this patch aimed at?
> 

Looks like this an old issue. Probably should add

Fixes: be0cf6ca301c ("scsi: storvsc: Set the tablesize based on the information 
given by the host")


[PATCH 0/2] storvsc: changes for scsi next

2017-05-18 Thread Stephen Hemminger
These are refactoring changes to the Hyper-V scsi driver.

Stephen Hemminger (2):
  storvsc: use in place iterator function
  storvsc: remove unnecessary channel inbound lock

 drivers/hv/channel_mgmt.c  |  1 -
 drivers/scsi/storvsc_drv.c | 52 --
 include/linux/hyperv.h |  1 -
 3 files changed, 18 insertions(+), 36 deletions(-)

-- 
2.11.0



[PATCH 1/2] storvsc: use in place iterator function

2017-05-18 Thread Stephen Hemminger
In 4.12-rc1, new functions were added to support iterating
over elements in the vmbus event ring. This patch uses them to
simplify the ring buffer handling in virtual SCSI driver as well.

Signed-off-by: Stephen Hemminger <sthem...@microsoft.com>
---
 drivers/scsi/storvsc_drv.c | 44 +++-
 1 file changed, 15 insertions(+), 29 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index ae966dc3bbc5..f8a1649e4c63 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1149,13 +1149,9 @@ static void storvsc_on_receive(struct storvsc_device 
*stor_device,
 static void storvsc_on_channel_callback(void *context)
 {
struct vmbus_channel *channel = (struct vmbus_channel *)context;
+   const struct vmpacket_descriptor *desc;
struct hv_device *device;
struct storvsc_device *stor_device;
-   u32 bytes_recvd;
-   u64 request_id;
-   unsigned char packet[ALIGN(sizeof(struct vstor_packet), 8)];
-   struct storvsc_cmd_request *request;
-   int ret;
 
if (channel->primary_channel != NULL)
device = channel->primary_channel->device_obj;
@@ -1166,32 +1162,22 @@ static void storvsc_on_channel_callback(void *context)
if (!stor_device)
return;
 
-   do {
-   ret = vmbus_recvpacket(channel, packet,
-  ALIGN((sizeof(struct vstor_packet) -
-vmscsi_size_delta), 8),
-  _recvd, _id);
-   if (ret == 0 && bytes_recvd > 0) {
-
-   request = (struct storvsc_cmd_request *)
-   (unsigned long)request_id;
-
-   if ((request == _device->init_request) ||
-   (request == _device->reset_request)) {
-
-   memcpy(>vstor_packet, packet,
-  (sizeof(struct vstor_packet) -
-   vmscsi_size_delta));
-   complete(>wait_event);
-   } else {
-   storvsc_on_receive(stor_device,
-   (struct vstor_packet *)packet,
-   request);
-   }
+   foreach_vmbus_pkt(desc, channel) {
+   void *packet = hv_pkt_data(desc);
+   struct storvsc_cmd_request *request;
+
+   request = (struct storvsc_cmd_request *)
+   ((unsigned long)desc->trans_id);
+
+   if (request == _device->init_request ||
+   request == _device->reset_request) {
+   memcpy(>vstor_packet, packet,
+  (sizeof(struct vstor_packet) - 
vmscsi_size_delta));
+   complete(>wait_event);
} else {
-   break;
+   storvsc_on_receive(stor_device, packet, request);
}
-   } while (1);
+   }
 }
 
 static int storvsc_connect_to_vsp(struct hv_device *device, u32 ring_size,
-- 
2.11.0



[PATCH 2/2] storvsc: remove unnecessary channel inbound lock

2017-05-18 Thread Stephen Hemminger
In storvsc driver, inbound messages do not go through inbound lock.
The only effect of this lock was is to provide a barrier for
connect and remove logic.

Signed-off-by: Stephen Hemminger <sthem...@microsoft.com>
---
 drivers/hv/channel_mgmt.c  | 1 -
 drivers/scsi/storvsc_drv.c | 8 +++-
 include/linux/hyperv.h | 1 -
 3 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 735f9363f2e4..685572bae1f0 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -332,7 +332,6 @@ static struct vmbus_channel *alloc_channel(void)
if (!channel)
return NULL;
 
-   spin_lock_init(>inbound_lock);
spin_lock_init(>lock);
 
INIT_LIST_HEAD(>sc_list);
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index f8a1649e4c63..8d955db6424f 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1206,13 +1206,13 @@ static int storvsc_connect_to_vsp(struct hv_device 
*device, u32 ring_size,
 static int storvsc_dev_remove(struct hv_device *device)
 {
struct storvsc_device *stor_device;
-   unsigned long flags;
 
stor_device = hv_get_drvdata(device);
 
-   spin_lock_irqsave(>channel->inbound_lock, flags);
stor_device->destroy = true;
-   spin_unlock_irqrestore(>channel->inbound_lock, flags);
+
+   /* Make sure flag is set before waiting */
+   wmb();
 
/*
 * At this point, all outbound traffic should be disable. We
@@ -1229,9 +1229,7 @@ static int storvsc_dev_remove(struct hv_device *device)
 * we have drained - to drain the outgoing packets, we need to
 * allow incoming packets.
 */
-   spin_lock_irqsave(>channel->inbound_lock, flags);
hv_set_drvdata(device, NULL);
-   spin_unlock_irqrestore(>channel->inbound_lock, flags);
 
/* Close the channel */
vmbus_close(device->channel);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index e09fc8290c2f..b7d7bbec74e0 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -744,7 +744,6 @@ struct vmbus_channel {
u32 ringbuffer_pagecount;
struct hv_ring_buffer_info outbound;/* send to parent */
struct hv_ring_buffer_info inbound; /* receive from parent */
-   spinlock_t inbound_lock;
 
struct vmbus_close_msg close_msg;
 
-- 
2.11.0



RE: [REGRESSION][Stable][v3.12.y][v4.4.y][v4.9.y][v4.10.y][v4.11-rc1] scsi: storvsc: properly set residual data length on errors

2017-03-27 Thread Stephen Hemminger
Are you sure the real problem is not the one fixed by this commit?

commit f1c635b439a5c01776fe3a25b1e2dc546ea82e6f
Author: Stephen Hemminger <step...@networkplumber.org>
Date:   Tue Mar 7 09:15:53 2017 -0800

scsi: storvsc: Workaround for virtual DVD SCSI version

Hyper-V host emulation of SCSI for virtual DVD device reports SCSI
version 0 (UNKNOWN) but is still capable of supporting REPORTLUN.

Without this patch, a GEN2 Linux guest on Hyper-V will not boot 4.11
successfully with virtual DVD ROM device. What happens is that the SCSI
scan process falls back to doing sequential probing by INQUIRY.  But the
storvsc driver has a previous workaround that masks/blocks all errors
reports from INQUIRY (or MODE_SENSE) commands.  This workaround causes
the scan to then populate a full set of bogus LUN's on the target and
then sends kernel spinning off into a death spiral doing block reads on
the non-existent LUNs.

By setting the correct blacklist flags, the target with the DVD device
is scanned with REPORTLUN and that works correctly.

Patch needs to go in current 4.11, it is safe but not necessary in older
kernels.

Signed-off-by: Stephen Hemminger <sthem...@microsoft.com>
Reviewed-by: K. Y. Srinivasan <k...@microsoft.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
Signed-off-by: Martin K. Petersen <martin.peter...@oracle.com>

-Original Message-
From: Joseph Salisbury [mailto:joseph.salisb...@canonical.com] 
Sent: Monday, March 27, 2017 1:22 PM
To: Long Li <lon...@microsoft.com>
Cc: KY Srinivasan <k...@microsoft.com>; Martin K. Petersen 
<martin.peter...@oracle.com>; Haiyang Zhang <haiya...@microsoft.com>; Stephen 
Hemminger <sthem...@microsoft.com>; j...@linux.vnet.ibm.com; 
de...@linuxdriverproject.org; linux-scsi <linux-scsi@vger.kernel.org>; LKML 
<linux-ker...@vger.kernel.org>; sta...@vger.kernel.org; Greg KH 
<gre...@linuxfoundation.org>
Subject: [REGRESSION][Stable][v3.12.y][v4.4.y][v4.9.y][v4.10.y][v4.11-rc1] 
scsi: storvsc: properly set residual data length on errors

Hi Long Li,

A kernel bug report was opened against Ubuntu [0].  After a kernel
bisect, it was found that reverting the following commit resolved this bug:

commit 40630f462824ee24bc00d692865c86c3828094e0
Author: Long Li <lon...@microsoft.com>
Date:   Wed Dec 14 18:46:03 2016 -0800

scsi: storvsc: properly set residual data length on errors


The regression was introduced in mainline as of v4.11-rc1.  It was also
cc'd to stable and has landed in v3.12.y, v4.4.y, v4.9.y and v4.10.y.


This regression seems pretty severe since it's preventing virtual
machines from booting.  It's affecting a couple of users so far.  I was
hoping to get your feedback, since you are the patch author.  Do you
think gathering any additional data will help diagnose this issue, or
would it be best to submit a revert request?


Thanks,

Joe


[0] http://pad.lv/1674635




Re: [PATCH] scsi: storvsc: Add support for FC rport.

2017-03-14 Thread Stephen Hemminger
On Tue, 14 Mar 2017 12:01:03 -0400
Cathy Avery  wrote:

>  #if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
>   if (host->transportt == fc_transport_template) {
> + struct fc_rport_identifiers ids;
> +
> + ids.node_name = 0;
> + ids.port_name = 0;
> + ids.port_id = 0;
> + ids.roles |= FC_PORT_ROLE_FCP_TARGET;

Since the variable ids is on the stack, it is uninitialized data.
Doing a OR with uninitialized data is not correct.

Better off to use C99 style iniatializer and skip the zero fields.

struct fc_rport_identifiers ids = {
.roles = FC_PORT_ROLE_FCP_TARGET,
};


[PATCH] storvsc: workaround for virtual DVD SCSI version

2017-03-07 Thread Stephen Hemminger
Hyper-V host emulation of SCSI for virtual DVD device reports SCSI
version 0 (UNKNOWN) but is still capable of supporting REPORTLUN. 

Without this patch, a GEN2 Linux guest on Hyper-V will not boot 4.11
successfully with virtual DVD ROM device. What happens is that the
SCSI scan process falls back to doing sequential probing by INQUIRY.
But the storvsc driver has a previous workaround that masks/blocks all
errors reports from INQUIRY (or MODE_SENSE) commands.  This workaround
causes the scan to then populate a full set of bogus LUN's on the
target and then sends kernel spinning off into a death spiral doing
block reads on the non-existent LUNs.

By setting the correct blacklist flags, the target with the
DVD device is scanned with REPORTLUN and that works correctly.

Patch needs to go in current 4.11, it is safe but not necessary
in older kernels.

Signed-off-by: Stephen Hemminger <sthem...@microsoft.com>
---
 drivers/scsi/storvsc_drv.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

PS: The error handling does need to be fixed (have patches pending)
but that is interrelated with hotplug and can wait.

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 638e5f427c90..19973e874830 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -400,8 +400,6 @@ MODULE_PARM_DESC(storvsc_vcpus_per_sub_channel, "Ratio of 
VCPUs to subchannels")
  */
 static int storvsc_timeout = 180;
 
-static int msft_blist_flags = BLIST_TRY_VPD_PAGES;
-
 #if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
 static struct scsi_transport_template *fc_transport_template;
 #endif
@@ -1383,6 +1381,22 @@ static int storvsc_do_io(struct hv_device *device,
return ret;
 }
 
+static int storvsc_device_alloc(struct scsi_device *sdevice)
+{
+   /*
+* Set blist flag to permit the reading of the VPD pages even when
+* the target may claim SPC-2 compliance. MSFT targets currently
+* claim SPC-2 compliance while they implement post SPC-2 features.
+* With this flag we can correctly handle WRITE_SAME_16 issues.
+*
+* Hypervisor reports SCSI_UNKNOWN type for DVD ROM device but
+* still supports REPORT LUN.
+*/
+   sdevice->sdev_bflags = BLIST_REPORTLUN2 | BLIST_TRY_VPD_PAGES;
+
+   return 0;
+}
+
 static int storvsc_device_configure(struct scsi_device *sdevice)
 {
 
@@ -1396,14 +1410,6 @@ static int storvsc_device_configure(struct scsi_device 
*sdevice)
sdevice->no_write_same = 1;
 
/*
-* Add blist flags to permit the reading of the VPD pages even when
-* the target may claim SPC-2 compliance. MSFT targets currently
-* claim SPC-2 compliance while they implement post SPC-2 features.
-* With this patch we can correctly handle WRITE_SAME_16 issues.
-*/
-   sdevice->sdev_bflags |= msft_blist_flags;
-
-   /*
 * If the host is WIN8 or WIN8 R2, claim conformance to SPC-3
 * if the device is a MSFT virtual device.  If the host is
 * WIN10 or newer, allow write_same.
@@ -1661,6 +1667,7 @@ static struct scsi_host_template scsi_driver = {
.eh_host_reset_handler =storvsc_host_reset_handler,
.proc_name ="storvsc_host",
.eh_timed_out = storvsc_eh_timed_out,
+   .slave_alloc =  storvsc_device_alloc,
.slave_configure =  storvsc_device_configure,
.cmd_per_lun =  255,
.this_id =  -1,
-- 
2.11.0



Re: [RFC] hv_storvsc: error handling.

2017-03-06 Thread Stephen Hemminger

> > I will try it, but it can't work for two reasons.
> > First, the INVALID_LUN error is masked off on INQUIRY in current code.
> > Second, the scsi_device is instantiated already as part of scan probe 
> > process
> > before it gets here.  
> 
> Was the invalid LUN in the LUN0 position. Inquiry of LUN0 support (when LUN0 
> is not populated)
> was added only recently to address host side issue.

When probing the code probes with LUN 1, ...

There is a cause where kernel does INQUIRY on LUN0, it looks kernel is asking 
for
page code 80 which is optional "Unit Serial Number".  And then WS2016 is 
returning
an error and invalid sense data.  The old masking of errors caused kernel to
interpret sense data as Unit Serial Number which is also not good but looks 
harmless.

 
> > The best solution so far is:
> > - remove old INQUIRY/SENSE error masking
> > + add new workaround for INQUIRY of device id on LUN 0
> >   which appears to be the reason for old masking
> > + return errors on missing LUN
> > + provide better transport services for hot remove (rather
> >   than detecting by failed I/O).  
> 
> This the mechanism used by the host for notifying LUN removal - invalid LUN 
> error code.

This has a couple of problems. First, it means that hotplug doesn't occur until
an I/O is done. Second the current code was not being truthful to block layer.
If it has to handle hotplug in this manner, it should have still failed the I/O.
If application was using direct I/O it would want to know that write failed.

Perhaps the existing channel mechanism can be used as notification path.



Re: [RFC] hv_storvsc: error handling.

2017-03-06 Thread Stephen Hemminger
On Sat, 4 Mar 2017 21:03:41 +
KY Srinivasan <k...@microsoft.com> wrote:

> > -Original Message-
> > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > Sent: Friday, March 3, 2017 4:50 PM
> > To: James Bottomley <james.bottom...@hansenpartnership.com>
> > Cc: Hannes Reinecke <h...@suse.de>; Christoph Hellwig <h...@lst.de>;
> > James Bottomley <j...@linux.vnet.ibm.com>; Jens Axboe
> > <ax...@kernel.dk>; Linus Torvalds <torva...@linux-foundation.org>;
> > Martin K. Petersen <martin.peter...@oracle.com>; KY Srinivasan
> > <k...@microsoft.com>; Dexuan Cui <de...@microsoft.com>; Long Li
> > <lon...@microsoft.com>; Josh Poulson <jopou...@microsoft.com>; Adrian
> > Suhov (Cloudbase Solutions SRL) <v-ads...@microsoft.com>; linux-
> > s...@vger.kernel.org; Haiyang Zhang <haiya...@microsoft.com>
> > Subject: [RFC] hv_storvsc: error handling.
> > 
> > Needs more testing but this does fix the observed problem.
> > 
> > From: Stephen Hemminger <sthem...@microsoft.com>
> > 
> > Subject: [PATCH] hv_storvsc: fix error handling
> > 
> > The Hyper-V storvsc SCSI driver was hiding all errors in INQUIRY and
> > MODE_SENSE commands. This caused the scan process to incorrectly think
> > devices were present and online.  Also invalid LUN errors were not
> > being handled correctly.
> > 
> > This fixes problems booting a GEN2 VM on Hyper-V. It effectively
> > reverts commit 4ed51a21c0f69 ("Staging: hv: storvsc: Fixup
> > srb and scsi status for INQUIRY and MODE_SENSE")
> > 
> > Signed-off-by: Stephen Hemminger <sthem...@microsoft.com>
> > ---
> >  drivers/scsi/storvsc_drv.c | 48 
> > --
> >  1 file changed, 4 insertions(+), 44 deletions(-)
> > 
> > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> > index 638e5f427c90..8cc241fc54b8 100644
> > --- a/drivers/scsi/storvsc_drv.c
> > +++ b/drivers/scsi/storvsc_drv.c
> > @@ -543,28 +543,6 @@ static void storvsc_host_scan(struct work_struct
> > *work)
> > kfree(wrk);
> >  }
> > 
> > -static void storvsc_remove_lun(struct work_struct *work)
> > -{
> > -   struct storvsc_scan_work *wrk;
> > -   struct scsi_device *sdev;
> > -
> > -   wrk = container_of(work, struct storvsc_scan_work, work);
> > -   if (!scsi_host_get(wrk->host))
> > -   goto done;
> > -
> > -   sdev = scsi_device_lookup(wrk->host, 0, wrk->tgt_id, wrk->lun);
> > -
> > -   if (sdev) {
> > -   scsi_remove_device(sdev);
> > -   scsi_device_put(sdev);
> > -   }
> > -   scsi_host_put(wrk->host);
> > -
> > -done:
> > -   kfree(wrk);
> > -}
> > -
> > -
> >  /*
> >   * We can get incoming messages from the host that are not in response to
> >   * messages that we have sent out. An example of this would be messages
> > @@ -955,8 +933,7 @@ static void storvsc_handle_error(struct
> > vmscsi_request *vm_srb,
> > }
> > break;
> > case SRB_STATUS_INVALID_LUN:
> > -   do_work = true;
> > -   process_err_fn = storvsc_remove_lun;
> > +   set_host_byte(scmnd, DID_NO_CONNECT);
> > break;
> > case SRB_STATUS_ABORTED:
> > if (vm_srb->srb_status & SRB_STATUS_AUTOSENSE_VALID
> > &&
> > @@ -1050,32 +1027,15 @@ static void storvsc_on_io_completion(struct
> > storvsc_device *stor_device,
> > 
> > stor_pkt = >vstor_packet;
> > 
> > -   /*
> > -* The current SCSI handling on the host side does
> > -* not correctly handle:
> > -* INQUIRY command with page code parameter set to 0x80
> > -* MODE_SENSE command with cmd[2] == 0x1c
> > -*
> > -* Setup srb and scsi status so this won't be fatal.
> > -* We do this so we can distinguish truly fatal failues
> > -* (srb status == 0x4) and off-line the device in that case.
> > -*/
> > -
> > -   if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
> > -  (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
> > -   vstor_packet->vm_srb.scsi_status = 0;
> > -   vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS;
> > -   }
> > -
> > -
> > /* Copy over the status...etc */
> > stor_pkt->vm_srb.scsi_status = vstor_packet->vm_srb.scsi_status;
> > stor_pkt->vm_srb.srb_status = vsto

[RFC] hv_storvsc: error handling.

2017-03-03 Thread Stephen Hemminger
Needs more testing but this does fix the observed problem.

From: Stephen Hemminger <sthem...@microsoft.com>

Subject: [PATCH] hv_storvsc: fix error handling

The Hyper-V storvsc SCSI driver was hiding all errors in INQUIRY and
MODE_SENSE commands. This caused the scan process to incorrectly think
devices were present and online.  Also invalid LUN errors were not
being handled correctly.

This fixes problems booting a GEN2 VM on Hyper-V. It effectively
reverts commit 4ed51a21c0f69 ("Staging: hv: storvsc: Fixup
srb and scsi status for INQUIRY and MODE_SENSE")

Signed-off-by: Stephen Hemminger <sthem...@microsoft.com>
---
 drivers/scsi/storvsc_drv.c | 48 --
 1 file changed, 4 insertions(+), 44 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 638e5f427c90..8cc241fc54b8 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -543,28 +543,6 @@ static void storvsc_host_scan(struct work_struct *work)
kfree(wrk);
 }
 
-static void storvsc_remove_lun(struct work_struct *work)
-{
-   struct storvsc_scan_work *wrk;
-   struct scsi_device *sdev;
-
-   wrk = container_of(work, struct storvsc_scan_work, work);
-   if (!scsi_host_get(wrk->host))
-   goto done;
-
-   sdev = scsi_device_lookup(wrk->host, 0, wrk->tgt_id, wrk->lun);
-
-   if (sdev) {
-   scsi_remove_device(sdev);
-   scsi_device_put(sdev);
-   }
-   scsi_host_put(wrk->host);
-
-done:
-   kfree(wrk);
-}
-
-
 /*
  * We can get incoming messages from the host that are not in response to
  * messages that we have sent out. An example of this would be messages
@@ -955,8 +933,7 @@ static void storvsc_handle_error(struct vmscsi_request 
*vm_srb,
}
break;
case SRB_STATUS_INVALID_LUN:
-   do_work = true;
-   process_err_fn = storvsc_remove_lun;
+   set_host_byte(scmnd, DID_NO_CONNECT);
break;
case SRB_STATUS_ABORTED:
if (vm_srb->srb_status & SRB_STATUS_AUTOSENSE_VALID &&
@@ -1050,32 +1027,15 @@ static void storvsc_on_io_completion(struct 
storvsc_device *stor_device,
 
stor_pkt = >vstor_packet;
 
-   /*
-* The current SCSI handling on the host side does
-* not correctly handle:
-* INQUIRY command with page code parameter set to 0x80
-* MODE_SENSE command with cmd[2] == 0x1c
-*
-* Setup srb and scsi status so this won't be fatal.
-* We do this so we can distinguish truly fatal failues
-* (srb status == 0x4) and off-line the device in that case.
-*/
-
-   if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
-  (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
-   vstor_packet->vm_srb.scsi_status = 0;
-   vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS;
-   }
-
-
/* Copy over the status...etc */
stor_pkt->vm_srb.scsi_status = vstor_packet->vm_srb.scsi_status;
stor_pkt->vm_srb.srb_status = vstor_packet->vm_srb.srb_status;
stor_pkt->vm_srb.sense_info_length =
vstor_packet->vm_srb.sense_info_length;
 
-   if (vstor_packet->vm_srb.scsi_status != 0 ||
-   vstor_packet->vm_srb.srb_status != SRB_STATUS_SUCCESS)
+   if (stor_pkt->vm_srb.cdb[0] != INQUIRY &&
+   (vstor_packet->vm_srb.scsi_status != 0 ||
+vstor_packet->vm_srb.srb_status != SRB_STATUS_SUCCESS))
storvsc_log(device, STORVSC_LOGGING_WARN,
"cmd 0x%x scsi status 0x%x srb status 0x%x\n",
stor_pkt->vm_srb.cdb[0],
-- 
2.11.0





Re: SCSI regression in 4.11

2017-03-03 Thread Stephen Hemminger
On Thu, 02 Mar 2017 11:18:23 -0800
James Bottomley <james.bottom...@hansenpartnership.com> wrote:

> On March 2, 2017 11:05:05 AM PST, Stephen Hemminger 
> <step...@networkplumber.org> wrote:
> >On Thu, 02 Mar 2017 10:36:17 -0800
> >James Bottomley <james.bottom...@hansenpartnership.com> wrote:
> >  
> >> On March 2, 2017 10:23:24 AM PST, Stephen Hemminger  
> ><step...@networkplumber.org> wrote:  
> >> >On Thu, 2 Mar 2017 14:25:14 +0100
> >> >Hannes Reinecke <h...@suse.de> wrote:
> >> >
> >> >> On 03/02/2017 02:40 AM, Stephen Hemminger wrote:
> >> >> > On Thu, 2 Mar 2017 01:56:15 +0100
> >> >> > Christoph Hellwig <h...@lst.de> wrote:
> >> >> >  
> >> >> >> On Thu, Mar 02, 2017 at 01:01:35AM +0100, Christoph Hellwig  
> >wrote:
> >> > 
> >> >> >>> On Wed, Mar 01, 2017 at 07:54:12AM -0800, Stephen Hemminger
> >> >wrote:  
> >> >> >>>>>
> >>
> >>
> >> http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4
> >>   
> >> > 
> >> >> >>>>
> >> >> >>>> It appears that is already in the code I am testing in
> >> >linux-next...  
> >> >> >>>
> >> >> >>> It's in -next now, but it wasn't at the time you reported the   
> >  
> >> >bug.
> >> >> >>>
> >> >> >>> And it would sortof explain the bug if the INQUIRY data is
> >> >correct
> >> >> >>> in the scatterlist, but we ignore it, given that  
> >scsi_probe_lun  
> >> >> >>> ignores the result based on sense data.
> >> >> >>>
> >> >> >>> Can you check what happens with the horrible hack below:  
> >> >> >>
> >> >> >> Strike that - we're checking result later, so this can't be the  
> >   
> >> >case.
> >> >> >>
> >> >> >> Now the other interesting thing is the memset in  
> >__scsi_exectute,  
> >> >> >> which looks very suspicious.  Try the following please:
> >> >> >>
> >> >> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> >> >> >> index 3e32dc954c3c..22f4fb550561 100644
> >> >> >> --- a/drivers/scsi/scsi_lib.c
> >> >> >> +++ b/drivers/scsi/scsi_lib.c
> >> >> >> @@ -253,7 +253,8 @@ static int __scsi_execute(struct  
> >scsi_device
> >> >*sdev, const unsigned char *cmd,
> >> >> >>   * and prevent security leaks by zeroing out the excess data.
> >> >> >>   */
> >> >> >>  if (unlikely(rq->resid_len > 0 && rq->resid_len <= bufflen))
> >> >> >> -memset(buffer + (bufflen - rq->resid_len), 0,  
> >rq->resid_len);  
> >> >> >> +//  memset(buffer + (bufflen - rq->resid_len), 0,  
> >rq->resid_len);  
> >> >> >> +printk_ratelimited("%s: got resid %d\n", __func__,
> >> >rq->resid_len);
> >> >> >>
> >> >> >>  if (resid)
> >> >> >>  *resid = rq->resid_len;  
> >> >> >
> >> >> >
> >> >> > Still fails but does print resid on some of the later INQUIRY
> >> >commands (not the initial one).
> >> >> >  
> >> >> Can you test what happens if you blank out the storvsc_drv
> >> >workaround:
> >> >> 
> >> >> diff --git a/drivers/scsi/storvsc_drv.c  
> >b/drivers/scsi/storvsc_drv.c  
> >> >> index 585e54f..c36f42d 100644
> >> >> --- a/drivers/scsi/storvsc_drv.c
> >> >> +++ b/drivers/scsi/storvsc_drv.c
> >> >> @@ -1060,13 +1060,13 @@ static void  
> >storvsc_on_io_completion(struct   
> >> >> storvsc_device *stor_device,
> >> >>   * We do this so we can distinguish truly fatal failues
> >> >>   * (srb status == 0x4) and off-line the device in that  
> >case.  
> >> >>   */
> >> >> -
> >> >> +

Re: SCSI regression in 4.11

2017-03-02 Thread Stephen Hemminger
On Thu, 02 Mar 2017 10:36:17 -0800
James Bottomley <james.bottom...@hansenpartnership.com> wrote:

> On March 2, 2017 10:23:24 AM PST, Stephen Hemminger 
> <step...@networkplumber.org> wrote:
> >On Thu, 2 Mar 2017 14:25:14 +0100
> >Hannes Reinecke <h...@suse.de> wrote:
> >  
> >> On 03/02/2017 02:40 AM, Stephen Hemminger wrote:  
> >> > On Thu, 2 Mar 2017 01:56:15 +0100
> >> > Christoph Hellwig <h...@lst.de> wrote:
> >> >
> >> >> On Thu, Mar 02, 2017 at 01:01:35AM +0100, Christoph Hellwig wrote:  
> >   
> >> >>> On Wed, Mar 01, 2017 at 07:54:12AM -0800, Stephen Hemminger  
> >wrote:
> >> >>>>>  
> > 
> > http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4
> >   
> >> >>>>
> >> >>>> It appears that is already in the code I am testing in  
> >linux-next...
> >> >>>
> >> >>> It's in -next now, but it wasn't at the time you reported the  
> >bug.  
> >> >>>
> >> >>> And it would sortof explain the bug if the INQUIRY data is  
> >correct  
> >> >>> in the scatterlist, but we ignore it, given that scsi_probe_lun
> >> >>> ignores the result based on sense data.
> >> >>>
> >> >>> Can you check what happens with the horrible hack below:
> >> >>
> >> >> Strike that - we're checking result later, so this can't be the  
> >case.  
> >> >>
> >> >> Now the other interesting thing is the memset in __scsi_exectute,
> >> >> which looks very suspicious.  Try the following please:
> >> >>
> >> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> >> >> index 3e32dc954c3c..22f4fb550561 100644
> >> >> --- a/drivers/scsi/scsi_lib.c
> >> >> +++ b/drivers/scsi/scsi_lib.c
> >> >> @@ -253,7 +253,8 @@ static int __scsi_execute(struct scsi_device  
> >*sdev, const unsigned char *cmd,  
> >> >>  * and prevent security leaks by zeroing out the excess data.
> >> >>  */
> >> >> if (unlikely(rq->resid_len > 0 && rq->resid_len <= bufflen))
> >> >> -   memset(buffer + (bufflen - rq->resid_len), 0, 
> >> >> rq->resid_len);
> >> >> +// memset(buffer + (bufflen - rq->resid_len), 0, 
> >> >> rq->resid_len);
> >> >> +   printk_ratelimited("%s: got resid %d\n", __func__,  
> >rq->resid_len);  
> >> >>
> >> >> if (resid)
> >> >> *resid = rq->resid_len;
> >> >
> >> >
> >> > Still fails but does print resid on some of the later INQUIRY  
> >commands (not the initial one).  
> >> >
> >> Can you test what happens if you blank out the storvsc_drv  
> >workaround:  
> >> 
> >> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> >> index 585e54f..c36f42d 100644
> >> --- a/drivers/scsi/storvsc_drv.c
> >> +++ b/drivers/scsi/storvsc_drv.c
> >> @@ -1060,13 +1060,13 @@ static void storvsc_on_io_completion(struct 
> >> storvsc_device *stor_device,
> >>   * We do this so we can distinguish truly fatal failues
> >>   * (srb status == 0x4) and off-line the device in that case.
> >>   */
> >> -
> >> +#if 0
> >>  if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
> >> (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
> >>  vstor_packet->vm_srb.scsi_status = 0;
> >>  vstor_packet->vm_srb.srb_status =  
> >SRB_STATUS_SUCCESS;  
> >>  }
> >> -
> >> +#endif
> >> 
> >>  /* Copy over the status...etc */
> >>  stor_pkt->vm_srb.scsi_status =  
> >vstor_packet->vm_srb.scsi_status;  
> >> 
> >> It might thappen that we're fail to interpret the 'Device not  
> >present'   
> >> status correctly (which will happen for non-connected DVDs) causing  
> >the   
> >> SCSI stack to make incorrect decisions later on.
> >> 
> >> Cheers,
> >> 
> >> Hannes  
> >
> >There are several oddities about the host SCSI interface that I see:
> > 1. The host bus seems to report up to 6 devices even though only 2 are
> > present (Disk and CDROM).
> >2. The CDROM emulation doesn't report the same status as a real device.
> > 3. The host emulation of SCSI doesn't support all the page codes which
> > is why there is the hack.
> >
> >But as James said, these don't appear to be related to the failure
> >because
> >the code worked before and only in post 4.11 merege is there a problem.  
> 
> Your wait for the hang trace is the most suggestive.   It says we're waiting 
> for a partition read to the spurious device.  Previously this would have 
> failed or timed out, so this seems to be the root cause.
> 
> James
> 
> 

Where is the number of valid LUN's determined during the scan process?


Re: SCSI regression in 4.11

2017-03-02 Thread Stephen Hemminger
On Thu, 2 Mar 2017 14:25:14 +0100
Hannes Reinecke <h...@suse.de> wrote:

> On 03/02/2017 02:40 AM, Stephen Hemminger wrote:
> > On Thu, 2 Mar 2017 01:56:15 +0100
> > Christoph Hellwig <h...@lst.de> wrote:
> >  
> >> On Thu, Mar 02, 2017 at 01:01:35AM +0100, Christoph Hellwig wrote:  
> >>> On Wed, Mar 01, 2017 at 07:54:12AM -0800, Stephen Hemminger wrote:  
> >>>>> 
> >>>>> http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4
> >>>>>   
> >>>>
> >>>> It appears that is already in the code I am testing in linux-next...  
> >>>
> >>> It's in -next now, but it wasn't at the time you reported the bug.
> >>>
> >>> And it would sortof explain the bug if the INQUIRY data is correct
> >>> in the scatterlist, but we ignore it, given that scsi_probe_lun
> >>> ignores the result based on sense data.
> >>>
> >>> Can you check what happens with the horrible hack below:  
> >>
> >> Strike that - we're checking result later, so this can't be the case.
> >>
> >> Now the other interesting thing is the memset in __scsi_exectute,
> >> which looks very suspicious.  Try the following please:
> >>
> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> >> index 3e32dc954c3c..22f4fb550561 100644
> >> --- a/drivers/scsi/scsi_lib.c
> >> +++ b/drivers/scsi/scsi_lib.c
> >> @@ -253,7 +253,8 @@ static int __scsi_execute(struct scsi_device *sdev, 
> >> const unsigned char *cmd,
> >> * and prevent security leaks by zeroing out the excess data.
> >> */
> >>if (unlikely(rq->resid_len > 0 && rq->resid_len <= bufflen))
> >> -  memset(buffer + (bufflen - rq->resid_len), 0, rq->resid_len);
> >> +//memset(buffer + (bufflen - rq->resid_len), 0, 
> >> rq->resid_len);
> >> +  printk_ratelimited("%s: got resid %d\n", __func__, 
> >> rq->resid_len);
> >>
> >>if (resid)
> >>*resid = rq->resid_len;  
> >
> >
> > Still fails but does print resid on some of the later INQUIRY commands (not 
> > the initial one).
> >  
> Can you test what happens if you blank out the storvsc_drv workaround:
> 
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index 585e54f..c36f42d 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -1060,13 +1060,13 @@ static void storvsc_on_io_completion(struct 
> storvsc_device *stor_device,
>   * We do this so we can distinguish truly fatal failues
>   * (srb status == 0x4) and off-line the device in that case.
>   */
> -
> +#if 0
>  if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
> (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
>  vstor_packet->vm_srb.scsi_status = 0;
>  vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS;
>  }
> -
> +#endif
> 
>  /* Copy over the status...etc */
>  stor_pkt->vm_srb.scsi_status = vstor_packet->vm_srb.scsi_status;
> 
> It might thappen that we're fail to interpret the 'Device not present' 
> status correctly (which will happen for non-connected DVDs) causing the 
> SCSI stack to make incorrect decisions later on.
> 
> Cheers,
> 
> Hannes

There are several oddities about the host SCSI interface that I see:
  1. The host bus seems to report up to 6 devices even though only 2 are
 present (Disk and CDROM).
  2. The CDROM emulation doesn't report the same status as a real device.
  3. The host emulation of SCSI doesn't support all the page codes which
 is why there is the hack.

But as James said, these don't appear to be related to the failure because
the code worked before and only in post 4.11 merege is there a problem.


Re: SCSI regression in 4.11

2017-03-02 Thread Stephen Hemminger
On Thu, 2 Mar 2017 14:25:14 +0100
Hannes Reinecke <h...@suse.de> wrote:

> On 03/02/2017 02:40 AM, Stephen Hemminger wrote:
> > On Thu, 2 Mar 2017 01:56:15 +0100
> > Christoph Hellwig <h...@lst.de> wrote:
> >  
> >> On Thu, Mar 02, 2017 at 01:01:35AM +0100, Christoph Hellwig wrote:  
> >>> On Wed, Mar 01, 2017 at 07:54:12AM -0800, Stephen Hemminger wrote:  
> >>>>> 
> >>>>> http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4
> >>>>>   
> >>>>
> >>>> It appears that is already in the code I am testing in linux-next...  
> >>>
> >>> It's in -next now, but it wasn't at the time you reported the bug.
> >>>
> >>> And it would sortof explain the bug if the INQUIRY data is correct
> >>> in the scatterlist, but we ignore it, given that scsi_probe_lun
> >>> ignores the result based on sense data.
> >>>
> >>> Can you check what happens with the horrible hack below:  
> >>
> >> Strike that - we're checking result later, so this can't be the case.
> >>
> >> Now the other interesting thing is the memset in __scsi_exectute,
> >> which looks very suspicious.  Try the following please:
> >>
> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> >> index 3e32dc954c3c..22f4fb550561 100644
> >> --- a/drivers/scsi/scsi_lib.c
> >> +++ b/drivers/scsi/scsi_lib.c
> >> @@ -253,7 +253,8 @@ static int __scsi_execute(struct scsi_device *sdev, 
> >> const unsigned char *cmd,
> >> * and prevent security leaks by zeroing out the excess data.
> >> */
> >>if (unlikely(rq->resid_len > 0 && rq->resid_len <= bufflen))
> >> -  memset(buffer + (bufflen - rq->resid_len), 0, rq->resid_len);
> >> +//memset(buffer + (bufflen - rq->resid_len), 0, 
> >> rq->resid_len);
> >> +  printk_ratelimited("%s: got resid %d\n", __func__, 
> >> rq->resid_len);
> >>
> >>if (resid)
> >>*resid = rq->resid_len;  
> >
> >
> > Still fails but does print resid on some of the later INQUIRY commands (not 
> > the initial one).
> >  
> Can you test what happens if you blank out the storvsc_drv workaround:
> 
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index 585e54f..c36f42d 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -1060,13 +1060,13 @@ static void storvsc_on_io_completion(struct 
> storvsc_device *stor_device,
>   * We do this so we can distinguish truly fatal failues
>   * (srb status == 0x4) and off-line the device in that case.
>   */
> -
> +#if 0
>  if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
> (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
>  vstor_packet->vm_srb.scsi_status = 0;
>  vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS;
>  }
> -
> +#endif
> 
>  /* Copy over the status...etc */
>  stor_pkt->vm_srb.scsi_status = vstor_packet->vm_srb.scsi_status;
> 
> It might thappen that we're fail to interpret the 'Device not present' 
> status correctly (which will happen for non-connected DVDs) causing the 
> SCSI stack to make incorrect decisions later on.
> 
> Cheers,
> 
> Hannes

Makes no difference, that was one of the first things I tried (and just tried 
again).


Re: SCSI regression in 4.11

2017-03-01 Thread Stephen Hemminger
On Thu, 2 Mar 2017 01:56:15 +0100
Christoph Hellwig <h...@lst.de> wrote:

> On Thu, Mar 02, 2017 at 01:01:35AM +0100, Christoph Hellwig wrote:
> > On Wed, Mar 01, 2017 at 07:54:12AM -0800, Stephen Hemminger wrote:  
> > > > 
> > > > http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4
> > > >   
> > > 
> > > It appears that is already in the code I am testing in linux-next...  
> > 
> > It's in -next now, but it wasn't at the time you reported the bug.
> > 
> > And it would sortof explain the bug if the INQUIRY data is correct
> > in the scatterlist, but we ignore it, given that scsi_probe_lun
> > ignores the result based on sense data.
> > 
> > Can you check what happens with the horrible hack below:  
> 
> Strike that - we're checking result later, so this can't be the case.
> 
> Now the other interesting thing is the memset in __scsi_exectute,
> which looks very suspicious.  Try the following please:
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 3e32dc954c3c..22f4fb550561 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -253,7 +253,8 @@ static int __scsi_execute(struct scsi_device *sdev, const 
> unsigned char *cmd,
>* and prevent security leaks by zeroing out the excess data.
>*/
>   if (unlikely(rq->resid_len > 0 && rq->resid_len <= bufflen))
> - memset(buffer + (bufflen - rq->resid_len), 0, rq->resid_len);
> +//   memset(buffer + (bufflen - rq->resid_len), 0, rq->resid_len);
> + printk_ratelimited("%s: got resid %d\n", __func__, 
> rq->resid_len);
>  
>   if (resid)
>   *resid = rq->resid_len;


Still fails but does print resid on some of the later INQUIRY commands (not the 
initial one).

[1.222728] scsi host0: storvsc_host_t
[1.230120] hv_storvsc: INQUIRY cmd 0x12 0x0 0x0 scsi status 0x0 srb status 
0x1 length 36
[1.235358] scsi host1: storvsc_host_t
[1.238629] hv_storvsc:  payload size 288 count 1 offset 3840 len 36 pfn 
0x1f1ef2
[1.241127] hv_storvsc:  sg 0: phys 0x1f1ef2000 virt 905db1ef2000 offset 
0xf00 length 36
[1.242422] hv_storvsc: INQUIRY cmd 0x12 0x0 0x0 scsi status 0x0 srb status 
0x1 length 36
[1.242423] hv_storvsc:  payload size 288 count 1 offset 3328 len 36 pfn 
0x1f112b
[1.242425] hv_storvsc:  sg 0: phys 0x1f112b000 virt 905db112b000 offset 
0xd00 length 36
[1.242427] data: : 05 80 00 02 1f 00 00 00 4d 73 66 74 20 20 20 20  
Msft
[1.242427] data: 0010: 56 69 72 74 75 61 6c 20 44 56 44 2d 52 4f 4d 20  
Virtual DVD-ROM 
[1.242428] data: 0020: 31 2e 30 20  
1.0 
[1.242456] scsi 1:0:0:0: CD-ROMMsft Virtual DVD-ROM  1.0  
PQ: 0 ANSI: 0
[1.242705] hv_storvsc: INQUIRY cmd 0x12 0x0 0x0 scsi status 0x0 srb status 
0x20 length 36
[1.242706] hv_storvsc:  payload size 288 count 1 offset 2816 len 36 pfn 
0x1f112b
[1.242707] hv_storvsc:  sg 0: phys 0x1f112b000 virt 905db112b000 offset 
0xb00 length 36
[1.242708] data: : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[1.242709] data: 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[1.242709] data: 0020: 00 00 00 00  

[1.242730] scsi host1: scsi scan: INQUIRY result too short (5), using 36
[1.242732] scsi 1:0:0:1: Direct-Access
PQ: 0 ANSI: 0
[1.242982] hv_storvsc: INQUIRY cmd 0x12 0x0 0x0 scsi status 0x0 srb status 
0x20 length 36
[1.242983] hv_storvsc:  payload size 288 count 1 offset 768 len 36 pfn 
0x1f1129
[1.242984] hv_storvsc:  sg 0: phys 0x1f1129000 virt 905db1129000 offset 
0x300 length 36
[1.242985] data: : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[1.242986] data: 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[1.242986] data: 0020: 00 00 00 00  

[1.243003] scsi 1:0:0:2: Direct-Access
PQ: 0 ANSI: 0
[1.244180] hv_storvsc: INQUIRY cmd 0x12 0x0 0x0 scsi status 0x0 srb status 
0x20 length 36
[1.244181] hv_storvsc:  payload size 288 count 1 offset 2048 len 36 pfn 
0x1f1128
[1.244181] hv_storvsc:  sg 0: phys 0x1f1128000 virt 905db1128000 offset 
0x800 length 36
[1.244182] data: : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[1.244183] data: 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[1.244183] data: 0020: 00 00 00 00  

[1.24

Re: SCSI regression in 4.11

2017-03-01 Thread Stephen Hemminger
On Thu, 2 Mar 2017 01:01:35 +0100
Christoph Hellwig <h...@lst.de> wrote:

> On Wed, Mar 01, 2017 at 07:54:12AM -0800, Stephen Hemminger wrote:
> > >   
> > > http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4
> > >   
> > 
> > It appears that is already in the code I am testing in linux-next...  
> 
> It's in -next now, but it wasn't at the time you reported the bug.
> 
> And it would sortof explain the bug if the INQUIRY data is correct
> in the scatterlist, but we ignore it, given that scsi_probe_lun
> ignores the result based on sense data.
> 
> Can you check what happens with the horrible hack below:
> 
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 6f7128f49c30..2f384ddb001a 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -603,7 +603,7 @@ static int scsi_probe_lun(struct scsi_device *sdev, 
> unsigned char *inq_result,
>   SCSI_LOG_SCAN_BUS(3, sdev_printk(KERN_INFO, sdev,
>   "scsi scan: INQUIRY %s with code 0x%x\n",
>   result ? "failed" : "successful", result));
> -
> +#if 0
>   if (result) {
>   /*
>* not-ready to ready transition [asc/ascq=0x28/0x0]
> @@ -628,6 +628,12 @@ static int scsi_probe_lun(struct scsi_device *sdev, 
> unsigned char *inq_result,
>   if (resid == try_inquiry_len)
>   continue;
>   }
> +#else
> + if (result) {
> + printk_ratelimited("got result %d:\n", result);
> + result = 0;
> + }
> +#endif
>   break;
>   }
>  

I tried that hack and the printk never hit. Could be related to error being 
masked
by storvsc driver workaround/hack.

Also let the system continue after going off in the weeds during SCSI scan.
After a long time we get stuck threads then EOM.


[  182.444084] hv_storvsc: INQUIRY cmd 0x12 0x1 0x83 scsi status 0x0 srb status 
0x1 length 52
[  182.448080] hv_storvsc:  payload size 288 count 1 offset 0 len 254 pfn 
0x1e3683
[  182.448080] hv_storvsc:  sg 0: phys 0x1e3683000 virt 885b63683000 offset 
0x0 length 254
[  182.448080] data: : 00 83 00 30 01 01 00 18 4d 53 46 54 20 20 20 20  
...0MSFT
[  182.448080] data: 0010: 43 77 cc 85 5f 19 c2 46 ac 48 c7 33 b9 dd 2d 2a  
Cw.._..F.H.3..-*
[  182.448080] data: 0020: 01 03 00 10 60 02 24 80 43 77 cc 85 5f 19 c7 33  
`.$.Cw.._..3
[  182.448080] data: 0030: b9 dd 2d 2a 00 00 00 00 00 00 00 00 00 00 00 00  
..-*
[  182.448080] data: 0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[  182.448080] data: 0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[  182.448080] data: 0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[  182.448080] data: 0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[  182.448080] data: 0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[  182.448080] data: 0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[  182.448080] data: 00a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[  182.448080] data: 00b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[  182.448080] data: 00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[  182.448080] data: 00d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[  182.448080] data: 00e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[  182.448080] data: 00f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
..
[  242.652032] INFO: task kworker/u480:0:5 blocked for more than 120 seconds.
[  242.655042]   Not tainted 4.10.0-next-20170228-next+ #4
[  242.656647] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  242.659539] kworker/u480:0  D0 5  2 0x
[  242.661100] Workqueue: events_unbound async_run_entry_fn
[  242.663105] Call Trace:
[  242.664209]  __schedule+0x27a/0x890
[  242.665179]  schedule+0x36/0x80
[  242.666520]  io_schedule+0x16/0x40
[  242.667933]  do_read_cache_page+0x400/0x550
[  242.669153]  ? blkdev_writepages+0x40/0x40
[  242.670775]  ? page_cache_tree_insert+0x120/0x120
[  242.672717]  read_cache_page+0x12/0x20
[  242.674592]  read_dev_sector+0x7d/0xe0
[  242.676457]  ? get_page_from_freelist+0x891/0xad0
[  242.679027]  read_lba+0x193/0x270
[  242.680421]  ? kmem_cache_alloc_trace+0x181/0x190
[  242.682573]  efi_partition+0xf6/0x840
[  242.684111]  ? string+0x59/0x80
[  242.68

Re: SCSI regression in 4.11

2017-03-01 Thread Stephen Hemminger
On Wed, 01 Mar 2017 15:09:44 -0800
James Bottomley <james.bottom...@hansenpartnership.com> wrote:

> On Wed, 2017-03-01 at 13:27 -0800, Stephen Hemminger wrote:
> > Ok here is much better data, wasn't accounting for the offset in the
> > payload  
> 
> But now both responses are the same:
> 
> > Working 4.10  
> [...]
> > [1.048920] hv_storvsc: INQUIRY cmd 0x12 0x0 0x0 scsi status 0x0
> > srb status 0x20 length 36
> > [1.048920] hv_storvsc:  payload size 288 count 1 offset 1024 len
> > 36 pfn 0x1f15d7
> > [1.048921] hv_storvsc:  sg 0: phys 0x1f15d7000 virt
> > 9270b15d7000 offset 0x400 length 36
> > [1.048922] data: : 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00 00 00  
> > [1.048923] data: 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00 00 00  
> > [1.048923] data: 0020: 00 00 00 00   
> >
> > [1.048942] scsi host1: scsi scan: INQUIRY result too short (5),
> > using 36
> > 
> > 
> > Broken 4.11  
> [...]
> > [1.812926] hv_storvsc: INQUIRY cmd 0x12 0x0 0x0 scsi status 0x0
> > srb status 0x20 length 36
> > [1.812927] hv_storvsc:  payload size 288 count 1 offset 2816 len
> > 36 pfn 0x1f198b
> > [1.812928] hv_storvsc:  sg 0: phys 0x1f198b000 virt
> > 9a42f198b000 offset 0xb00 length 36
> > [1.812953] data: : 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00 00 00  
> > [1.812953] data: 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00 00 00  
> > [1.812954] data: 0020: 00 00 00 00   
> >
> > [1.813258] scsi host1: scsi scan: INQUIRY result too short (5),
> > using 36  
> 
> Hyper-v is returning a buffer of zeros for INQUIRY to the offline
> device in both cases ... that means the problem isn't where I thought
> it was.
> 
> James

Yes. Earlier output wasn't accounting for offsets, new dump code is.


Re: SCSI regression in 4.11

2017-03-01 Thread Stephen Hemminger
Ok here is much better data, wasn't accounting for the offset in the payload


Working 4.10
[1.020041] scsi host0: storvsc_host_t
[1.024998] hv_storvsc: INQUIRY cmd 0x12 0x0 0x0 scsi status 0x0 srb status 
0x1 length 36
[1.027452] hv_storvsc:  payload size 288 count 1 offset 3072 len 36 pfn 
0x1f15d0
[1.029385] hv_storvsc:  sg 0: phys 0x1f15d virt 9270b15d offset 
0xc00 length 36
[1.031720] data: : 00 00 05 02 1f 00 00 02 4d 73 66 74 20 20 20 20  
Msft
[1.033846] data: 0010: 56 69 72 74 75 61 6c 20 44 69 73 6b 20 20 20 20  
Virtual Disk
[1.036080] data: 0020: 31 2e 30 20  
1.0 
[1.038249] scsi 0:0:0:0: Direct-Access Msft Virtual Disk 1.0  
PQ: 0 ANSI: 5
[1.040394] hv_storvsc: INQUIRY cmd 0x12 0x1 0x0 scsi status 0x0 srb status 
0x1 length 12
[1.041494] scsi host1: storvsc_host_t
[1.044683] hv_storvsc:  payload size 288 count 1 offset 0 len 255 pfn 
0x1f0faf
[1.046940] hv_storvsc:  sg 0: phys 0x1f0faf000 virt 9270b0faf000 offset 
0x0 length 255
[1.048555] hv_storvsc: INQUIRY cmd 0x12 0x0 0x0 scsi status 0x0 srb status 
0x1 length 36
[1.048556] hv_storvsc:  payload size 288 count 1 offset 2816 len 36 pfn 
0x1f15d6
[1.048557] hv_storvsc:  sg 0: phys 0x1f15d6000 virt 9270b15d6000 offset 
0xb00 length 36
[1.048559] data: : 05 80 00 02 1f 00 00 00 4d 73 66 74 20 20 20 20  
Msft
[1.048560] data: 0010: 56 69 72 74 75 61 6c 20 44 56 44 2d 52 4f 4d 20  
Virtual DVD-ROM 
[1.048560] data: 0020: 31 2e 30 20  
1.0 
[1.048590] scsi 1:0:0:0: CD-ROMMsft Virtual DVD-ROM  1.0  
PQ: 0 ANSI: 0
[1.048920] hv_storvsc: INQUIRY cmd 0x12 0x0 0x0 scsi status 0x0 srb status 
0x20 length 36
[1.048920] hv_storvsc:  payload size 288 count 1 offset 1024 len 36 pfn 
0x1f15d7
[1.048921] hv_storvsc:  sg 0: phys 0x1f15d7000 virt 9270b15d7000 offset 
0x400 length 36
[1.048922] data: : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[1.048923] data: 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[1.048923] data: 0020: 00 00 00 00  

[1.048942] scsi host1: scsi scan: INQUIRY result too short (5), using 36


Broken 4.11

[1.487930] scsi host0: storvsc_host_t
[1.541254] hv_storvsc: INQUIRY cmd 0x12 0x0 0x0 scsi status 0x0 srb status 
0x1 length 36
[1.545242] hv_storvsc:  payload size 288 count 1 offset 768 len 36 pfn 
0x1f1196
[1.545242] hv_storvsc:  sg 0: phys 0x1f1196000 virt 9a42f1196000 offset 
0x300 length 36
[1.545242] data: : 00 00 05 02 1f 00 00 02 4d 73 66 74 20 20 20 20  
Msft
[1.545242] data: 0010: 56 69 72 74 75 61 6c 20 44 69 73 6b 20 20 20 20  
Virtual Disk
[1.545242] data: 0020: 31 2e 30 20  
1.0 
[1.570136] scsi 0:0:0:0: Direct-Access Msft Virtual Disk 1.0  
PQ: 0 ANSI: 5
[1.571001] scsi host1: storvsc_host_t
[1.608793] hv_storvsc: INQUIRY cmd 0x12 0x1 0x0 scsi status 0x0 srb status 
0x1 length 12
[1.610316] hv_storvsc:  payload size 288 count 1 offset 0 len 255 pfn 
0x1f0577
[1.611749] hv_storvsc:  sg 0: phys 0x1f0577000 virt 9a42f0577000 offset 
0x0 length 255
[1.612788] data: : 00 00 00 08 00 83 8f b0 b1 b2 ce cf 72 65 61 74  
reat
[1.612788] data: 0010: 65 5f 66 69 6c 65 00 5f 5f 63 72 63 5f 68 69 64  
e_file.__crc_hid
[1.612788] data: 0020: 5f 72 65 70 6f 72 74 5f 72 61 77 5f 65 76 65 6e  
_report_raw_even
[1.612788] data: 0030: 74 00 76 66 72 65 65 00 75 6e 72 65 67 69 73 74  
t.vfree.unregist
[1.612788] data: 0040: 65 72 5f 63 68 72 64 65 76 5f 72 65 67 69 6f 6e  
er_chrdev_region
[1.612788] data: 0050: 00 6d 75 74 65 78 5f 75 6e 6c 6f 63 6b 00 68 69  
.mutex_unlock.hi
[1.612788] data: 0060: 64 5f 64 65 62 75 67 5f 65 78 69 74 00 6b 6d 65  
d_debug_exit.kme
[1.612788] data: 0070: 6d 64 75 70 00 68 69 64 72 61 77 5f 65 78 69 74  
mdup.hidraw_exit
[1.612788] data: 0080: 00 5f 5f 67 65 74 5f 75 73 65 72 5f 34 00 70 6f  
.__get_user_4.po
[1.612788] data: 0090: 77 65 72 5f 73 75 70 70 6c 79 5f 70 6f 77 65 72  
wer_supply_power
[1.612788] data: 00a0: 73 00 64 65 76 69 63 65 5f 64 65 73 74 72 6f 79  
s.device_destroy
[1.612788] data: 00b0: 00 5f 5f 63 72 63 5f 68 69 64 5f 73 6e 74 6f 33  
.__crc_hid_snto3
[1.612788] data: 00c0: 32 00 63 61 6e 63 65 6c 5f 77 6f 72 6b 5f 73 79  
2.cancel_work_sy
[1.612788] data: 00d0: 6e 63 00 73 65 71 5f 70 72 69 6e 74 66 00 64 6f  
nc.seq_printf.do
[1.612788] data: 00e0: 77 6e 5f 69 6e 74 65 72 72 75 70 74 69 62 6c 65  
wn_interruptible
[1.612788] data: 00f0: 00 5f 5f 63 72 63 5f 68 69 64 5f 70 61 72 73 
.__crc_hid_pars
[1.649097] 

Re: SCSI regression in 4.11

2017-03-01 Thread Stephen Hemminger
On Wed, 01 Mar 2017 11:20:22 -0800
James Bottomley <james.bottom...@hansenpartnership.com> wrote:

> On Wed, 2017-03-01 at 10:57 -0800, James Bottomley wrote:
> > On Wed, 2017-03-01 at 10:48 -0800, Stephen Hemminger wrote:  
> > > On Tue, 28 Feb 2017 22:20:58 -0800
> > > James Bottomley <j...@linux.vnet.ibm.com> wrote:
> > >   
> > > > On Tue, 2017-02-28 at 17:25 -0800, Stephen Hemminger wrote:  
> > > > > [1.346023] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0
> > > > > srb
> > > > > status 0x20 length 36
> > > > > [1.352913] inquiry data: : 00 aa be f1 5c 98 ff ff
> > > > > f0
> > > > > 64
> > > > > 02 89 ff ff ff ff
> > > > > [1.356543] inquiry data: 0010: 00 00 00 00 00 00 00 00
> > > > > 00
> > > > > 00
> > > > > 00 00 00 00 00 00
> > > > > [1.359996] inquiry data: 0020: 00 00 00 00
> > > > > [1.361835] scsi host1: scsi scan: INQUIRY result too short
> > > > > (5),
> > > > > using 36
> > > > > [1.361888] hv_storvsc:  IO cmd 0xa0 0x0 0x0 scsi status 0x0
> > > > > srb
> > > > > status 0x1 length 16
> > > > > [1.362307] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0
> > > > > srb
> > > > > status 0x1 length 36
> > > > > [1.362308] inquiry data: : 00 23 34 f1 5c 98 ff ff
> > > > > f0
> > > > > 64
> > > > > 02 89 ff ff ff ff
> > > > > [1.362309] inquiry data: 0010: 00 00 00 00 00 00 00 00
> > > > > 00
> > > > > 00
> > > > > 00 00 00 00 00 00
> > > > > [1.362309] inquiry data: 0020: 00 00 00 00
> > > > > [1.377423] scsi 1:0:0:1: Direct-Access 
> > > > >   
> > > > > 
> > > > >  PQ: 0 ANSI: 0
> > > > 
> > > > Well, this pinpoints the fault to the block uncopy, I think.  The
> > > > Inquiry data is clearly correct in the page frame, so it's not
> > > > getting
> > > > copied to the scsi_execute() buffer for some reason.
> > > > 
> > > > James
> > > >   
> > > 
> > > Why do I see different sense data on good (4.10) versus bad (4.11)
> > > 
> > > Good 4.10 initial INQUIRY buffer
> > > [1.012413] data: : 00 2e 64 71 db 97 ff ff f0 94 62 96
> > > ff
> > > ff ff ff
> > > [1.012413] data: 0010: 00 00 00 00 00 00 00 00 00 00 00 00
> > > 00
> > > 00 00 00
> > > [1.012414] data: 0020: 00 00 00 00
> > > 
> > > Bad 4.11 initial INQUIRY buffer  
> > > [1.218159] data: : 00 00 05 02 1f 00 00 02 4d 73 66 74
> > > 20
> > > 20 20 20
> > > [1.225654] data: 0010: 56 69 72 74 75 61 6c 20 44 69 73 6b
> > > 20
> > > 20 20 20
> > > [1.242930] data: 0020: 31 2e 30 20
> > > 
> > > Is the kmap_atomic looking at the right place?  
> > 
> > Actually, the 4.11 data looks good.  You can tell from the string at
> > byte 8.  It's rubbish in the 4.10 one and 'Msft ' in the 4.11 one (I
> > assume you just reversed the cut and paste).
> > 
> > These should be the page physical addresses you sent down to the
> > hypervisor, so kmap should work.  Perhaps print out the physical page
> > address so we see what we're getting.  
> 
> Another possible explanation is non zero sg->offset, in which case you
> might not see the INQUIRY data because you start at the beginning of
> the page.  This shouldn't happen because you specify no alignment
> override in the driver, so we should start the INQUIRY buffer on a new
> page and copy the data back to the real buffer, but perhaps that's what
> changed.

Dumping more data from 4.9, 4.10 and 4.11



Re: SCSI regression in 4.11

2017-03-01 Thread Stephen Hemminger
On Tue, 28 Feb 2017 22:20:58 -0800
James Bottomley <j...@linux.vnet.ibm.com> wrote:

> On Tue, 2017-02-28 at 17:25 -0800, Stephen Hemminger wrote:
> > [1.346023] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb
> > status 0x20 length 36
> > [1.352913] inquiry data: : 00 aa be f1 5c 98 ff ff f0 64
> > 02 89 ff ff ff ff
> > [1.356543] inquiry data: 0010: 00 00 00 00 00 00 00 00 00 00
> > 00 00 00 00 00 00
> > [1.359996] inquiry data: 0020: 00 00 00 00
> > [1.361835] scsi host1: scsi scan: INQUIRY result too short (5),
> > using 36
> > [1.361888] hv_storvsc:  IO cmd 0xa0 0x0 0x0 scsi status 0x0 srb
> > status 0x1 length 16
> > [1.362307] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb
> > status 0x1 length 36
> > [1.362308] inquiry data: : 00 23 34 f1 5c 98 ff ff f0 64
> > 02 89 ff ff ff ff
> > [1.362309] inquiry data: 0010: 00 00 00 00 00 00 00 00 00 00
> > 00 00 00 00 00 00
> > [1.362309] inquiry data: 0020: 00 00 00 00
> > [1.377423] scsi 1:0:0:1: Direct-Access   
> >  PQ: 0 ANSI: 0  
> 
> Well, this pinpoints the fault to the block uncopy, I think.  The
> Inquiry data is clearly correct in the page frame, so it's not getting
> copied to the scsi_execute() buffer for some reason.
> 
> James
> 

Why do I see different sense data on good (4.10) versus bad (4.11)

Good 4.10 initial INQUIRY buffer
[1.012413] data: : 00 2e 64 71 db 97 ff ff f0 94 62 96 ff ff ff ff
[1.012413] data: 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[1.012414] data: 0020: 00 00 00 00

Bad 4.11 initial INQUIRY buffer  
[1.218159] data: : 00 00 05 02 1f 00 00 02 4d 73 66 74 20 20 20 20
[1.225654] data: 0010: 56 69 72 74 75 61 6c 20 44 69 73 6b 20 20 20 20
[1.242930] data: 0020: 31 2e 30 20

Is the kmap_atomic looking at the right place?


Re: SCSI regression in 4.11

2017-03-01 Thread Stephen Hemminger
Dexuan has reproduced the same problem and discovered that is related to 
whether virtual DVD is
attached to the VM.  My VM had empty virtual DVD (offline) from the 
installation of the ISO.
If the DVD device is removed then the VM boots.

This makes the problem less of a catastrophic but we still need to get to root 
cause.


Re: SCSI regression in 4.11

2017-03-01 Thread Stephen Hemminger
On Wed, 1 Mar 2017 16:50:57 +0100
Christoph Hellwig <h...@lst.de> wrote:

> On Tue, Feb 28, 2017 at 10:48:45PM -0800, Stephen Hemminger wrote:
> > Let me know, I can run another test and dump more data.  
> 
> Could it be that we keep the old sense buffer values around because
> my commit change the way how sense buffers are handled.  A while ago
> I suggested this patch to fix it, so maybe you can try it:
> 
>   
> http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4

It appears that is already in the code I am testing in linux-next...

bool scsi_normalize_sense(const u8 *sense_buffer, int sb_len,
  struct scsi_sense_hdr *sshdr)
{
memset(sshdr, 0, sizeof(struct scsi_sense_hdr));

if (!sense_buffer || !sb_len)
return false;

sshdr->response_code = (sense_buffer[0] & 0x7f);

if (!scsi_sense_valid(sshdr))


Re: SCSI regression in 4.11

2017-02-28 Thread Stephen Hemminger
On Tue, 28 Feb 2017 22:20:58 -0800
James Bottomley <j...@linux.vnet.ibm.com> wrote:

> On Tue, 2017-02-28 at 17:25 -0800, Stephen Hemminger wrote:
> > [1.346023] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb
> > status 0x20 length 36
> > [1.352913] inquiry data: : 00 aa be f1 5c 98 ff ff f0 64
> > 02 89 ff ff ff ff
> > [1.356543] inquiry data: 0010: 00 00 00 00 00 00 00 00 00 00
> > 00 00 00 00 00 00
> > [1.359996] inquiry data: 0020: 00 00 00 00
> > [1.361835] scsi host1: scsi scan: INQUIRY result too short (5),
> > using 36
> > [1.361888] hv_storvsc:  IO cmd 0xa0 0x0 0x0 scsi status 0x0 srb
> > status 0x1 length 16
> > [1.362307] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb
> > status 0x1 length 36
> > [1.362308] inquiry data: : 00 23 34 f1 5c 98 ff ff f0 64
> > 02 89 ff ff ff ff
> > [1.362309] inquiry data: 0010: 00 00 00 00 00 00 00 00 00 00
> > 00 00 00 00 00 00
> > [1.362309] inquiry data: 0020: 00 00 00 00
> > [1.377423] scsi 1:0:0:1: Direct-Access   
> >  PQ: 0 ANSI: 0  
> 
> Well, this pinpoints the fault to the block uncopy, I think.  The
> Inquiry data is clearly correct in the page frame, so it's not getting
> copied to the scsi_execute() buffer for some reason.
> 
> James
> 

Let me know, I can run another test and dump more data.


Re: SCSI regression in 4.11

2017-02-28 Thread Stephen Hemminger

> Let's concentrate on INQUIRY since that's the first command in the
> probe sequence.  I think it's completing successfully because your
> hyperv layer says it has 36 bytes of transfer and that's the size of a
> successful initial INQUIRY, so the fact that the code above would break
> stuff if the INQUIRY failed is orthogonal to the the current problem.

Right, that bodge only breaks some minor things likes scsiinfo commands.

> can you print out some of the DMA buffer in storvsc_on_io_completion()?
> 
> I think just the stor_pkt->vm_srb.cdb[0] (to identify the command
> completing) and byte 5 of the buffer will tell us what we need to know.
>  It's going to be complex to get byte 5, you'll need to do a
> kmap_atomic_pfn on request->payload->range.pfn_array[0] and then look
> at byte 5.  If that's zero it means there's some problem with hyperv
> writing to the pfn if it's 0x24 (expected value for an initial inquiry)
> we've got a problem somewhere in bio completion not copying the value
> back.

Here is another boot, this time went dumpster diving as you suggested
to get the request data.



diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 2af63a80c7fa..a51d8eba6e04 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1059,6 +1059,21 @@ static void storvsc_on_io_completion(struct 
storvsc_device *stor_device,
vstor_packet->vm_srb.srb_status,
vstor_packet->vm_srb.data_transfer_length);
 
+   if (stor_pkt->vm_srb.cdb[0] == INQUIRY) {
+   struct scsi_cmnd *cmd = request->cmd;
+   struct scatterlist *sg = scsi_sglist(cmd);
+   struct page *page = sg_page(sg);
+   void *vaddr = kmap_atomic(page);
+   
+   print_hex_dump(KERN_INFO,
+  "inquiry data: ", DUMP_PREFIX_OFFSET,
+  16, 1,
+  vaddr, vstor_packet->vm_srb.data_transfer_length,
+  false);
+
+   kunmap_atomic(page);
+   }
+
/*
 * The current SCSI handling on the host side does
 * not correctly handle:

Which results in:
...

[1.225501] scsi host0: storvsc_host_t
[1.234707] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x1 
length 36
[1.235517] scsi host1: storvsc_host_t
[1.238037] inquiry data: : 00 23 34 f1 5c 98 ff ff f0 64 02 89 ff 
ff ff ff
[1.256800] inquiry data: 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00
[1.259430] inquiry data: 0020: 00 00 00 00
[1.261431] scsi 0:0:0:0: Direct-Access Msft Virtual Disk 1.0  
PQ: 0 ANSI: 5
[1.264080] hv_storvsc:  IO cmd 0x12 0x1 0x0 scsi status 0x0 srb status 0x1 
length 12
[1.267420] inquiry data: : 00 00 00 08 00 83 8f b0 b1 b2 ce cf
[1.270759] hv_storvsc:  IO cmd 0x12 0x1 0x83 scsi status 0x0 srb status 0x1 
length 52
[1.275007] inquiry data: : 00 83 00 30 01 01 00 18 4d 53 46 54 20 
20 20 20
[1.277988] inquiry data: 0010: 43 77 cc 85 5f 19 c2 46 ac 48 c7 33 b9 
dd 2d 2a
[1.281096] inquiry data: 0020: 01 03 00 10 60 02 24 80 43 77 cc 85 5f 
19 c7 33
[1.284246] inquiry data: 0030: b9 dd 2d 2a
[1.306538] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x1 
length 36
[1.310621] inquiry data: : 00 92 be f1 5c 98 ff ff f0 64 02 89 ff 
ff ff ff
[1.316244] inquiry data: 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00
[1.328151] random: fast init done
[1.340184] inquiry data: 0020: 00 00 00 00
[1.342710] scsi 1:0:0:0: CD-ROMMsft Virtual DVD-ROM  1.0  
PQ: 0 ANSI: 0
[1.346023] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x20 
length 36
[1.352913] inquiry data: : 00 aa be f1 5c 98 ff ff f0 64 02 89 ff 
ff ff ff
[1.356543] inquiry data: 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00
[1.359996] inquiry data: 0020: 00 00 00 00
[1.361835] scsi host1: scsi scan: INQUIRY result too short (5), using 36
[1.361888] hv_storvsc:  IO cmd 0xa0 0x0 0x0 scsi status 0x0 srb status 0x1 
length 16
[1.362307] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x1 
length 36
[1.362308] inquiry data: : 00 23 34 f1 5c 98 ff ff f0 64 02 89 ff 
ff ff ff
[1.362309] inquiry data: 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00
[1.362309] inquiry data: 0020: 00 00 00 00
[1.377423] scsi 1:0:0:1: Direct-Access
PQ: 0 ANSI: 0
[1.399208] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x20 
length 36
[1.402836] inquiry data: : 00 a3 be f1 5c 98 ff ff 00 00 00 00 00 
00 00 00
[1.406466] inquiry data: 0010: 00 00 00 00 00 00 00 00 c0 30 66 f9 5c 
98 ff ff
[1.409766] inquiry data: 0020: 00 00 00 00
[1.412366] scsi 1:0:0:2: Direct-Access

Re: SCSI regression in 4.11

2017-02-28 Thread Stephen Hemminger
On Tue, 28 Feb 2017 09:06:13 -0800
James Bottomley <j...@linux.vnet.ibm.com> wrote:

> On Tue, 2017-02-28 at 08:32 -0700, Jens Axboe wrote:
> > On 02/28/2017 07:08 AM, Christoph Hellwig wrote:  
> > > On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger wrote:  
> > > > Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each
> > > > I/O")
> > > > 
> > > > but that is already in linux-next.
> > > > 
> > > > Noticed another place where memset(of the data was being done not
> > > > the extra bits.
> > > > Tried this, but didn't fix it either...  
> > > 
> > > Are you using blk-mq or the legacy request code?  
> > 
> > Stephen doesn't have MQ set in the config he posted, I'm assuming he 
> > didn't boot with scsi_mod.use_blk_mq=true. In a previous email, I 
> > asked if turning on MQ makes a difference.  
> 
> OK, since we're not making much progress, Stephen, could you insert
> some debugging into the storvsc driver?  The trace clearly shows we're
> getting zeros back in the buffer when we should have data from the
> initial scan.  Firstly, does the vmbus think it's transferring any data
> for the INQUIRY and READ_CAPACITY commands (looks like
> storvsc_command_completion() data_transfer_length)?  If it does,
> there's probably an issue initialising the sg list.  If it doesn't,
> we're probably sending bogus commands.
> 
> James

I tried booting with scsi_mod.use_blk_mq=true and that did nothing.
Rebuilding now with 
CONFIG_SCSI_MQ_DEFAULT=y

Sure, can instrument. after that test.


Re: SCSI regression in 4.11

2017-02-28 Thread Stephen Hemminger
On Tue, 28 Feb 2017 09:06:13 -0800
James Bottomley <j...@linux.vnet.ibm.com> wrote:

> On Tue, 2017-02-28 at 08:32 -0700, Jens Axboe wrote:
> > On 02/28/2017 07:08 AM, Christoph Hellwig wrote:  
> > > On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger wrote:  
> > > > Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each
> > > > I/O")
> > > > 
> > > > but that is already in linux-next.
> > > > 
> > > > Noticed another place where memset(of the data was being done not
> > > > the extra bits.
> > > > Tried this, but didn't fix it either...  
> > > 
> > > Are you using blk-mq or the legacy request code?  
> > 
> > Stephen doesn't have MQ set in the config he posted, I'm assuming he 
> > didn't boot with scsi_mod.use_blk_mq=true. In a previous email, I 
> > asked if turning on MQ makes a difference.  
> 
> OK, since we're not making much progress, Stephen, could you insert
> some debugging into the storvsc driver?  The trace clearly shows we're
> getting zeros back in the buffer when we should have data from the
> initial scan.  Firstly, does the vmbus think it's transferring any data
> for the INQUIRY and READ_CAPACITY commands (looks like
> storvsc_command_completion() data_transfer_length)?  If it does,
> there's probably an issue initialising the sg list.  If it doesn't,
> we're probably sending bogus commands.
> 
> James
> 

The following code in storvsc looks suspicious

static void storvsc_on_io_completion(struct storvsc_device *stor_device,
  struct vstor_packet *vstor_packet,
  struct storvsc_cmd_request *request)
{
struct vstor_packet *stor_pkt;
struct hv_device *device = stor_device->device;

stor_pkt = >vstor_packet;

/*
 * The current SCSI handling on the host side does
 * not correctly handle:
 * INQUIRY command with page code parameter set to 0x80
 * MODE_SENSE command with cmd[2] == 0x1c
 *
 * Setup srb and scsi status so this won't be fatal.
 * We do this so we can distinguish truly fatal failues
 * (srb status == 0x4) and off-line the device in that case.
 */

if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
   (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
vstor_packet->vm_srb.scsi_status = 0;
vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS;
}

If SCSI layer is sending inquiry about devices to do scanning then wouldn't this
workaround break things?  Maybe a better to fully test for the broken command.

Original commit was:

commit 4ed51a21c0f69e1379cf858fc21a9d9022bfe0e7
Author: K. Y. Srinivasan <k...@microsoft.com>
Date:   Sat Aug 27 11:31:26 2011 -0700

Staging: hv: storvsc: Fixup srb and scsi status for INQUIRY and MODE_SENSE

The current VHD handler on the Windows Host does not correctly handle
INQUIRY and MODE_SENSE commands with some options. Fixup srb_status
in these cases since the failure is not fatal.

Signed-off-by: K. Y. Srinivasan <k...@microsoft.com>
Signed-off-by: Haiyang Zhang <haiya...@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gre...@suse.de>




Re: SCSI regression in 4.11

2017-02-28 Thread Stephen Hemminger
On Tue, 28 Feb 2017 09:06:13 -0800
James Bottomley <j...@linux.vnet.ibm.com> wrote:

> On Tue, 2017-02-28 at 08:32 -0700, Jens Axboe wrote:
> > On 02/28/2017 07:08 AM, Christoph Hellwig wrote:  
> > > On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger wrote:  
> > > > Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each
> > > > I/O")
> > > > 
> > > > but that is already in linux-next.
> > > > 
> > > > Noticed another place where memset(of the data was being done not
> > > > the extra bits.
> > > > Tried this, but didn't fix it either...  
> > > 
> > > Are you using blk-mq or the legacy request code?  
> > 
> > Stephen doesn't have MQ set in the config he posted, I'm assuming he 
> > didn't boot with scsi_mod.use_blk_mq=true. In a previous email, I 
> > asked if turning on MQ makes a difference.  
> 
> OK, since we're not making much progress, Stephen, could you insert
> some debugging into the storvsc driver?  The trace clearly shows we're
> getting zeros back in the buffer when we should have data from the
> initial scan.  Firstly, does the vmbus think it's transferring any data
> for the INQUIRY and READ_CAPACITY commands (looks like
> storvsc_command_completion() data_transfer_length)?  If it does,
> there's probably an issue initialising the sg list.  If it doesn't,
> we're probably sending bogus commands.
> 
> James
> 

This is log of failed boot of linux-next (with blk_mq enabled).
See attached if you want to see exactly what got added.

Sorry don't speak SCSI yet.


[1.496999] hv_utils: Registering HyperV Utility Driver
[1.505415] hv_vmbus: registering driver hv_util
[1.508074] input: AT Translated Set 2 keyboard as 
/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:00/VMBUS:00/d34b2567-b9b6-42b9-8778-0a4ec0b955bf/serio0/input/input0
[1.524314] hv_vmbus: registering driver hid_hyperv
[1.543577] hv_storvsc f71e9b3b-aed9-4dd2-a1ff-c00218a39853: complete init 
request
[1.547120] hv_storvsc f71e9b3b-aed9-4dd2-a1ff-c00218a39853: complete init 
request
[1.549952] hv_storvsc f71e9b3b-aed9-4dd2-a1ff-c00218a39853: complete init 
request
[1.552876] hv_storvsc f71e9b3b-aed9-4dd2-a1ff-c00218a39853: complete init 
request
[1.555931] hv_storvsc f71e9b3b-aed9-4dd2-a1ff-c00218a39853: complete init 
request
[1.559151] scsi host0: storvsc_host_t
[1.590197] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x1 
length 36
[1.590765] hv_storvsc e7d171c1-a71d-400e-854e-7e0cc000ce3d: complete init 
request
[1.590796] hv_storvsc e7d171c1-a71d-400e-854e-7e0cc000ce3d: complete init 
request
[1.590820] hv_storvsc e7d171c1-a71d-400e-854e-7e0cc000ce3d: complete init 
request
[1.590839] hv_storvsc e7d171c1-a71d-400e-854e-7e0cc000ce3d: complete init 
request
[1.590942] hv_storvsc e7d171c1-a71d-400e-854e-7e0cc000ce3d: complete init 
request
[1.591506] scsi host1: storvsc_host_t
[1.600925] hv_utils: Heartbeat IC version 3.0
[1.619450] input: Microsoft Vmbus HID-compliant Mouse as 
/devices/0006:045E:0621.0001/input/input1
[1.620658] hid 0006:045E:0621.0001: input:  HID v0.01 Mouse 
[Microsoft Vmbus HID-compliant Mouse] on 
[1.627185] hv_utils: Shutdown IC version 3.0
[1.627280] hv_utils: cannot register PTP clock: 0
[1.628081] random: fast init done
[1.628890] hv_utils: TimeSync IC version 4.0
[1.629407] hv_utils: VSS IC version 5.0
[1.651620] scsi 0:0:0:0: Direct-Access Msft Virtual Disk 1.0  
PQ: 0 ANSI: 5
[1.651956] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x1 
length 36
[1.651977] scsi 1:0:0:0: CD-ROMMsft Virtual DVD-ROM  1.0  
PQ: 0 ANSI: 0
[1.652279] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x20 
length 36
[1.652297] scsi host1: scsi scan: INQUIRY result too short (5), using 36
[1.652299] scsi 1:0:0:1: Direct-Access
PQ: 0 ANSI: 0
[1.652595] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x20 
length 36
[1.652611] scsi 1:0:0:2: Direct-Access
PQ: 0 ANSI: 0
[1.652892] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x20 
length 36
[1.652910] scsi 1:0:0:3: Direct-Access
PQ: 0 ANSI: 0
[1.653178] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x20 
length 36
[1.653195] scsi 1:0:0:4: Direct-Access
PQ: 0 ANSI: 0
[1.653445] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x20 
length 36
[1.653464] scsi 1:0:0:5: Direct-Access
PQ: 0 ANSI: 0
[1.653729] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x20 
length 36
[1.653746] scs

Re: SCSI regression in 4.11

2017-02-28 Thread Stephen Hemminger
On Tue, 28 Feb 2017 15:08:12 +0100
Christoph Hellwig <h...@lst.de> wrote:

> On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger wrote:
> > Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each I/O")
> > 
> > but that is already in linux-next.
> > 
> > Noticed another place where memset(of the data was being done not the extra 
> > bits.
> > Tried this, but didn't fix it either...  
> 
> Are you using blk-mq or the legacy request code?

I was using legacy, but even with CONFIG_SCSI_MQ_DEFAULT=y
the same failure occurs.


Re: SCSI regression in 4.11

2017-02-27 Thread Stephen Hemminger
On Mon, 27 Feb 2017 15:30:30 -0800
Stephen Hemminger <step...@networkplumber.org> wrote:

> Something in SCSI in 4.11 broke booting on Hyper-V Generation 2 VM with 8 
> VCPU and 4G of memory.
> Both Linus's current tree (4.11 pre-rc1) and linux-next fail in a similar 
> manner. It looks like some error
> in SCSI device detection because there is only a single device.
> 
> The offending commit causing the regression is:
> 
> $ git bisect bad
> e9c787e65c0c36529745be47d490d998b4b6e589 is the first bad commit
> commit e9c787e65c0c36529745be47d490d998b4b6e589
> Author: Christoph Hellwig <h...@lst.de>
> Date:   Mon Jan 2 21:55:26 2017 +0300
> 
> scsi: allocate scsi_cmnd structures as part of struct request
> 
> Rely on the new block layer functionality to allocate additional driver
> specific data behind struct request instead of implementing it in SCSI
> itѕelf.
> 
> Signed-off-by: Christoph Hellwig <h...@lst.de>
> Acked-by: Martin K. Petersen <martin.peter...@oracle.com>
> Reviewed-by: Hannes Reinecke <h...@suse.com>
> Signed-off-by: Jens Axboe <ax...@fb.com>
> 
> :04 04 6ff016fcdae227efeb19c1c301b17ccd7ea35da6 
> 70d79f99d9b79ecf4dccbe067fc697219f5c78da Mdrivers
> :04 04 a672ff52df8b2c211b3f98cae4a88d8a96ccde0b 
> 1aaaed7de0994f597c7f8290c722a0b4a7789429 Minclude
> 
> I checked and tree is current and up to date and includes 
> commit ee5242360424b9b967454e9183767323d10cf985
> Author: Christoph Hellwig <h...@lst.de>
> Date:   Tue Feb 21 10:04:55 2017 +0100
> 
> scsi: zero per-cmd driver data before each I/O
> 
> Kernel config is attached. It started with Ubuntu config, but then did 
> localmodconfig and pruned
> out unnecessary stuff.
> 

This problem I am seeing looks like the one addressed by:

Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each I/O")

but that is already in linux-next.

Noticed another place where memset(of the data was being done not the extra 
bits.
Tried this, but didn't fix it either...


diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ba2286652ff6..7e0463e78ff4 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1850,7 +1850,7 @@ static int scsi_mq_prep_fn(struct request *req)
 
/* zero out the cmd, except for the embedded scsi_request */
memset((char *)cmd + sizeof(cmd->req), 0,
-   sizeof(*cmd) - sizeof(cmd->req));
+   sizeof(*cmd) - sizeof(cmd->req) + sdev->host->hostt->cmd_size);
 
req->special = cmd;
 


Re: [PATCHv1] net-next: treewide use is_vlan_dev() helper function.

2017-02-04 Thread Stephen Hemminger
On Sat,  4 Feb 2017 11:00:49 -0600
Parav Pandit <pa...@mellanox.com> wrote:

> This patch makes use of is_vlan_dev() function instead of flag
> comparison which is exactly done by is_vlan_dev() helper function.
> 
> Signed-off-by: Parav Pandit <pa...@mellanox.com>
> Reviewed-by: Daniel Jurgens <dani...@mellanox.com>
> ---
>  drivers/infiniband/core/cma.c|  6 ++
>  drivers/infiniband/sw/rxe/rxe_net.c  |  2 +-
>  drivers/net/ethernet/broadcom/cnic.c |  2 +-
>  drivers/net/ethernet/chelsio/cxgb3/l2t.c |  2 +-
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c  |  4 ++--
>  drivers/net/ethernet/chelsio/cxgb4/l2t.c |  2 +-
>  drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c |  8 
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |  4 ++--
>  drivers/net/hyperv/netvsc_drv.c  |  2 +-
>  drivers/scsi/bnx2fc/bnx2fc_fcoe.c|  6 +++---
>  drivers/scsi/cxgbi/libcxgbi.c|  6 +++---
>  drivers/scsi/fcoe/fcoe.c | 13 ++---
>  include/rdma/ib_addr.h   |  6 ++
>  net/hsr/hsr_slave.c  |  3 ++-
>  14 files changed, 31 insertions(+), 35 deletions(-)

Looks good thanks.

Acked-by: Stephen Hemminger <step...@networkplumber.org>


[PATCH 3/4] scsi: make pci error handlers const

2012-09-07 Thread Stephen Hemminger
Signed-off-by: Stephen Hemminger shemmin...@vyatta.com


---
Depends on first patch that changes PCI core.

 drivers/scsi/ipr.c   |2 +-
 drivers/scsi/lpfc/lpfc_init.c|2 +-
 drivers/scsi/mpt2sas/mpt2sas_scsih.c |2 +-
 drivers/scsi/qla2xxx/qla_os.c|2 +-
 drivers/scsi/qla4xxx/ql4_os.c|2 +-
 drivers/scsi/sym53c8xx_2/sym_glue.c  |2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

--- a/drivers/scsi/ipr.c2012-09-07 09:21:37.154557681 -0700
+++ b/drivers/scsi/ipr.c2012-09-07 09:27:25.495059297 -0700
@@ -9228,7 +9228,7 @@ static struct pci_device_id ipr_pci_tabl
 };
 MODULE_DEVICE_TABLE(pci, ipr_pci_table);
 
-static struct pci_error_handlers ipr_err_handler = {
+static const struct pci_error_handlers ipr_err_handler = {
.error_detected = ipr_pci_error_detected,
.slot_reset = ipr_pci_slot_reset,
 };
--- a/drivers/scsi/lpfc/lpfc_init.c 2012-09-07 09:21:37.154557681 -0700
+++ b/drivers/scsi/lpfc/lpfc_init.c 2012-09-07 09:27:25.499059257 -0700
@@ -10425,7 +10425,7 @@ static struct pci_device_id lpfc_id_tabl
 
 MODULE_DEVICE_TABLE(pci, lpfc_id_table);
 
-static struct pci_error_handlers lpfc_err_handler = {
+static const struct pci_error_handlers lpfc_err_handler = {
.error_detected = lpfc_io_error_detected,
.slot_reset = lpfc_io_slot_reset,
.resume = lpfc_io_resume,
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c  2012-09-07 09:21:37.154557681 
-0700
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c  2012-09-07 09:27:25.499059257 
-0700
@@ -8306,7 +8306,7 @@ _scsih_pci_mmio_enabled(struct pci_dev *
return PCI_ERS_RESULT_NEED_RESET;
 }
 
-static struct pci_error_handlers _scsih_err_handler = {
+static const struct pci_error_handlers _scsih_err_handler = {
.error_detected = _scsih_pci_error_detected,
.mmio_enabled = _scsih_pci_mmio_enabled,
.slot_reset =   _scsih_pci_slot_reset,
--- a/drivers/scsi/qla2xxx/qla_os.c 2012-09-07 09:21:37.154557681 -0700
+++ b/drivers/scsi/qla2xxx/qla_os.c 2012-09-07 09:27:25.499059257 -0700
@@ -4471,7 +4471,7 @@ qla2xxx_pci_resume(struct pci_dev *pdev)
ha-flags.eeh_busy = 0;
 }
 
-static struct pci_error_handlers qla2xxx_err_handler = {
+static const struct pci_error_handlers qla2xxx_err_handler = {
.error_detected = qla2xxx_pci_error_detected,
.mmio_enabled = qla2xxx_pci_mmio_enabled,
.slot_reset = qla2xxx_pci_slot_reset,
--- a/drivers/scsi/qla4xxx/ql4_os.c 2012-09-07 09:21:37.154557681 -0700
+++ b/drivers/scsi/qla4xxx/ql4_os.c 2012-09-07 09:27:25.503059217 -0700
@@ -6148,7 +6148,7 @@ qla4xxx_pci_resume(struct pci_dev *pdev)
clear_bit(AF_EEH_BUSY, ha-flags);
 }
 
-static struct pci_error_handlers qla4xxx_err_handler = {
+static const struct pci_error_handlers qla4xxx_err_handler = {
.error_detected = qla4xxx_pci_error_detected,
.mmio_enabled = qla4xxx_pci_mmio_enabled,
.slot_reset = qla4xxx_pci_slot_reset,
--- a/drivers/scsi/sym53c8xx_2/sym_glue.c   2012-09-07 09:21:37.154557681 
-0700
+++ b/drivers/scsi/sym53c8xx_2/sym_glue.c   2012-09-07 09:27:25.503059217 
-0700
@@ -2117,7 +2117,7 @@ static struct pci_device_id sym2_id_tabl
 
 MODULE_DEVICE_TABLE(pci, sym2_id_table);
 
-static struct pci_error_handlers sym2_err_handler = {
+static const struct pci_error_handlers sym2_err_handler = {
.error_detected = sym2_io_error_detected,
.mmio_enabled   = sym2_io_slot_dump,
.slot_reset = sym2_io_slot_reset,


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [V2 PATCH 1/9] csiostor: Chelsio FCoE offload driver submission (sources part 1).

2012-09-05 Thread Stephen Hemminger
On Wed,  5 Sep 2012 18:03:54 +0530
Naresh Kumar Inna nar...@chelsio.com wrote:

 +
 +/* FCoE Adapter types  its description */
 +static struct csio_adap_desc csio_fcoe_adapters[] = {

Tables like this should be const.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [V2 PATCH 2/9] csiostor: Chelsio FCoE offload driver submission (sources part 2).

2012-09-05 Thread Stephen Hemminger
On Wed,  5 Sep 2012 18:03:55 +0530
Naresh Kumar Inna nar...@chelsio.com wrote:

 This patch contains code for driver initialization, driver resource
 allocation and the Work Request module functionality. Driver initialization
 includes module entry/exit points, registration with PCI, FC transport and
 SCSI mid layer subsystems. The Work Request module provides services for
 allocation of DMA queues, posting Work Requests on them and processing
 completions.
 
 Signed-off-by: Naresh Kumar Inna nar...@chelsio.com

Although the comments say you are using proc fs, there is no
code here related to that.

Any use of debugfs must be conditional the DEBUG_FS kernel configuration
parameter.  Your code probably will break if DEBUG_FS is not
enabled.  For a possible alternative see how a sub-config parameter
was added in sky2 driver.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [V2 PATCH 6/9] csiostor: Chelsio FCoE offload driver submission (headers part 1).

2012-09-05 Thread Stephen Hemminger
On Wed,  5 Sep 2012 18:03:59 +0530
Naresh Kumar Inna nar...@chelsio.com wrote:

 +#define CSIO_ROUNDUP(__v, __r)   (((__v) + (__r) - 1) / (__r))

This is similar to existing round_up() in kernel.h could you use that?
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [V2 PATCH 6/9] csiostor: Chelsio FCoE offload driver submission (headers part 1).

2012-09-05 Thread Stephen Hemminger
On Wed,  5 Sep 2012 18:03:59 +0530
Naresh Kumar Inna nar...@chelsio.com wrote:

 +
 +#define  CSIO_ASSERT(cond)   
 \
 +do { \
 + if (unlikely(!((cond\
 + BUG();  \
 +} while (0)
 +

Why is this not just BUG_ON(!(cond)) ?
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [V2 PATCH 2/9] csiostor: Chelsio FCoE offload driver submission (sources part 2).

2012-09-05 Thread Stephen Hemminger
On Wed, 5 Sep 2012 23:13:46 +0530
Naresh Kumar Inna nar...@chelsio.com wrote:

 On 9/5/2012 9:59 PM, Stephen Hemminger wrote:
  On Wed,  5 Sep 2012 18:03:55 +0530
  Naresh Kumar Inna nar...@chelsio.com wrote:
  
  This patch contains code for driver initialization, driver resource
  allocation and the Work Request module functionality. Driver initialization
  includes module entry/exit points, registration with PCI, FC transport and
  SCSI mid layer subsystems. The Work Request module provides services for
  allocation of DMA queues, posting Work Requests on them and processing
  completions.
 
  Signed-off-by: Naresh Kumar Inna nar...@chelsio.com
  
  Although the comments say you are using proc fs, there is no
  code here related to that.
 
 I will remove that comment.
 
  
  Any use of debugfs must be conditional the DEBUG_FS kernel configuration
  parameter.  Your code probably will break if DEBUG_FS is not
  enabled.  For a possible alternative see how a sub-config parameter
  was added in sky2 driver.
  
 
 It appears that debugfs_create_dir() returns an error if DEBUG_FS is not
 enabled. Considering the driver handles this error and continues
 initialization, do you still think I should guard this code within DEBUG_FS?
 
 Thanks.

That works, just make sure and test it.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] fixing errors handling during pci_driver resume stage [net]

2007-01-09 Thread Stephen Hemminger
On Tue, 09 Jan 2007 12:01:14 +0300
Dmitriy Monakhov [EMAIL PROTECTED] wrote:

 network pci drivers have to return correct error code during resume stage in
 case of errors.
 Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED]
 -

Please don't introduce one dev_err() call into a device driver if all the other
error printout's use a different interface.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Something wrong with latest adaptec aic driver

2005-03-09 Thread Stephen Hemminger
(Found out not to put aic7xxx in subject line because of spam filter...)

Running with latest 2.6.11 bk tree, one of my machines takes for ever to boot.
The driver is builtin (not a module).  It worked fine until today, and works
fine with 2.6.11 or earlier.  Looks like it is going into some probing fallback
unwind code mess... This is the current driver not the old driver.

The scsi controller is:

:03:01.0 SCSI storage controller: Adaptec AIC-7899P U160/m (rev 01)
Subsystem: IBM: Unknown device 00cf
Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 20
BIST result: 00
I/O ports at 9000 [disabled]
Memory at f2006000 (64-bit, non-prefetchable) [size=4K]
Capabilities: [dc] Power Management version 2

:03:01.1 SCSI storage controller: Adaptec AIC-7899P U160/m (rev 01)
Subsystem: IBM: Unknown device 00cf
Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 20
BIST result: 00
I/O ports at 9400 [disabled]
Memory at f2005000 (64-bit, non-prefetchable) [size=4K]
Capabilities: [dc] Power Management version 2

During boot up the console output is...

ACPI: PCI interrupt :03:01.1[B] - GSI 20 (level, low) - IRQ 20
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.36
Adaptec aic7899 Ultra160 SCSI adapter
aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

(scsi0:A:0): 160.000MB/s transfers (80.000MHz DT, offset 63, 16bit)
(scsi0:A:0): 80.000MB/s transfers (40.000MHz DT, offset 63, 16bit)
(scsi0:A:0): 80.000MB/s transfers (40.000MHz, offset 63, 16bit)
ahc_intr: HOST_MSG_LOOP bad phase 0x0
(scsi0:A:0): 40.000MB/s transfers (40.000MHz, offset 63)
(scsi0:A:0): 66.006MB/s transfers (33.003MHz DT, offset 63, 16bit)
(scsi0:A:0): 66.006MB/s transfers (33.003MHz, offset 63, 16bit)
(scsi0:A:0): 40.000MB/s transfers (20.000MHz DT, offset 63, 16bit)
(scsi0:A:0): 40.000MB/s transfers (20.000MHz, offset 63, 16bit)
(scsi0:A:0): 33.003MB/s transfers (33.003MHz, offset 63)
(scsi0:A:0): 20.000MB/s transfers (20.000MHz, offset 63)
(scsi0:A:0): 38.460MB/s transfers (19.230MHz DT, offset 63, 16bit)
(scsi0:A:0): 38.460MB/s transfers (19.230MHz, offset 63, 16bit)
(scsi0:A:0): 35.714MB/s transfers (17.857MHz DT, offset 63, 16bit)
(scsi0:A:0): 35.714MB/s transfers (17.857MHz, offset 63, 16bit)
(scsi0:A:0): 33.332MB/s transfers (16.666MHz DT, offset 63, 16bit)
(scsi0:A:0): 33.332MB/s transfers (16.666MHz, offset 63, 16bit)
(scsi0:A:0): 31.250MB/s transfers (15.625MHz DT, offset 63, 16bit)
(scsi0:A:0): 31.250MB/s transfers (15.625MHz, offset 63, 16bit)
(scsi0:A:0): 29.410MB/s transfers (14.705MHz DT, offset 63, 16bit)
(scsi0:A:0): 29.410MB/s transfers (14.705MHz, offset 63, 16bit)
(scsi0:A:0): 27.776MB/s transfers (13.888MHz DT, offset 63, 16bit)
(scsi0:A:0): 27.776MB/s transfers (13.888MHz, offset 63, 16bit)
(scsi0:A:0): 26.314MB/s transfers (13.157MHz DT, offset 63, 16bit)
(scsi0:A:0): 26.314MB/s transfers (13.157MHz, offset 63, 16bit)
(scsi0:A:0): 25.000MB/s transfers (12.500MHz DT, offset 63, 16bit)
(scsi0:A:0): 25.000MB/s transfers (12.500MHz, offset 63, 16bit)
(scsi0:A:0): 23.808MB/s transfers (11.904MHz DT, offset 63, 16bit)
(scsi0:A:0): 23.808MB/s transfers (11.904MHz, offset 63, 16bit)
(scsi0:A:0): 22.726MB/s transfers (11.363MHz DT, offset 63, 16bit)
(scsi0:A:0): 22.726MB/s transfers (11.363MHz, offset 63, 16bit)
(scsi0:A:0): 21.738MB/s transfers (10.869MHz DT, offset 63, 16bit)
(scsi0:A:0): 21.738MB/s transfers (10.869MHz, offset 63, 16bit)
(scsi0:A:0): 20.832MB/s transfers (10.416MHz DT, offset 63, 16bit)
(scsi0:A:0): 20.832MB/s transfers (10.416MHz, offset 63, 16bit)
(scsi0:A:0): 20.000MB/s transfers (10.000MHz DT, offset 63, 16bit)
(scsi0:A:0): 20.000MB/s transfers (10.000MHz, offset 63, 16bit)
(scsi0:A:0): 19.230MB/s transfers (9.615MHz, offset 63, 16bit)
(scsi0:A:0): 18.518MB/s transfers (9.259MHz, offset 63, 16bit)
(scsi0:A:0): 17.856MB/s transfers (8.928MHz, offset 63, 16bit)
(scsi0:A:0): 17.240MB/s transfers (8.620MHz, offset 63, 16bit)
(scsi0:A:0): 16.666MB/s transfers (8.333MHz, offset 63, 16bit)
(scsi0:A:0): 16.128MB/s transfers (8.064MHz, offset 63, 16bit)
(scsi0:A:0): 15.624MB/s transfers (7.812MHz, offset 63, 16bit)
(scsi0:A:0): 15.150MB/s transfers (7.575MHz, offset 63, 16bit)
(scsi0:A:0): 14.704MB/s transfers (7.352MHz, offset 63, 16bit)
(scsi0:A:0): 14.284MB/s transfers (7.142MHz, offset 63, 16bit)
(scsi0:A:0): 13.888MB/s transfers (6.944MHz, offset 63, 16bit)
(scsi0:A:0): 13.512MB/s transfers (6.756MHz, offset 63, 16bit)
(scsi0:A:0): 13.156MB/s transfers (6.578MHz, offset 63, 16bit)
(scsi0:A:0): 12.820MB/s transfers (6.410MHz, offset 63, 16bit)
(scsi0:A:0): 12.500MB/s transfers (6.250MHz, offset 63, 16bit)
(scsi0:A:0): 12.194MB/s transfers (6.097MHz, offset 63, 16bit)
(scsi0:A:0): 11.904MB/s transfers (5.952MHz, offset 63, 16bit)
(scsi0:A:0): 11.626MB/s transfers (5.813MHz, offset 63, 16bit)
(scsi0:A:0): 11.362MB/s transfers (5.681MHz, offset 63, 16bit)
(scsi0:A:0): 

Re: IP/ethernet/networking-over-SCSI.

2005-01-25 Thread Stephen Hemminger
On Tue, 25 Jan 2005 16:19:12 + (GMT)
S Iremonger [EMAIL PROTECTED] wrote:

 
 For some purposes at the moment, working IP-over-SCSI could be rather
   handy... ;-).

You probably are looking for:
http://linux-iscsi.sourceforge.net/

-- 
Stephen Hemminger   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html