Re: 4.10+ qla2xxx driver wont load for qla2xxx (ISP2532-based 8Gb) with BAR 3 error, work fine on 4.9

2017-03-12 Thread Laurence Oberman


- Original Message -
> From: "Laurence Oberman" 
> To: "Chad Dupuis" , "Himanshu Madhani" 
> 
> Cc: "Linux SCSI List" 
> Sent: Sunday, March 12, 2017 7:39:23 AM
> Subject: 4.10+ qla2xxx  driver wont load for qla2xxx (ISP2532-based 8Gb) with 
> BAR 3 error, work fine on 4.9
> 
> Chad, Himanshu
> 
> Before I bisect or go chase changes, wanted to reach out because the driver
> seems to be the same version.
> Perhaps this is a PCIE change in the kernel for 4.10 affecting the load.
> Its the same targetLIO server I have been using for a long time with 4.9
> 
> 27:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI
> Express HBA (rev 02)
> 
> With 4.9 I have no issues loading the driver for my targetLIO server.
> (DL380G8)
> 
> # modinfo qla2xxx | more
> filename:
> /lib/modules/4.9.0.lobetcm+/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
> firmware:   ql2500_fw.bin
> version:8.07.00.38-k
> license:GPL
> description:QLogic Fibre Channel HBA Driver
> author: QLogic Corporation
> srcversion: 94A8431A85BFF854B97B02D
> 
> [8.906351] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA
> Driver: 8.07.00.38-k.
> [   10.014052] qla2xxx [:27:00.0]-001d: : Found an ISP2532 irq 106 iobase
> 0xadce989a1000.
> [   10.455108] scsi host4: qla2xxx
> [   10.460206] qla2xxx [:27:00.0]-00fb:4: QLogic QLE2562 - PCI-Express
> Dual Channel 8Gb Fibre Channel HBA.
> [   10.460215] qla2xxx [:27:00.0]-00fc:4: ISP2532: PCIe (5.0GT/s x8) @
> :27:00.0 hdma+ host#=4 fw=8.03.00 (90d5).
> [   10.460545] qla2xxx [:27:00.1]-001d: : Found an ISP2532 irq 110 iobase
> 0xadce989a9000.
> [   10.662120] scsi host5: qla2xxx
> [   11.007841] qla2xxx [:27:00.1]-00fb:5: QLogic QLE2562 - PCI-Express
> Dual Channel 8Gb Fibre Channel HBA.
> [   11.007849] qla2xxx [:27:00.1]-00fc:5: ISP2532: PCIe (5.0GT/s x8) @
> :27:00.1 hdma+ host#=5 fw=8.03.00 (90d5).
> 
> Rebooting on the same server with 4.10 fails to load
> 
> Linux  4.10.0+
> # modinfo qla2xxx | more
> filename:   /lib/modules/4.10.0+/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
> firmware:   ql2500_fw.bin
> version:8.07.00.38-k
> license:GPL
> description:QLogic Fibre Channel HBA Driver
> author: QLogic Corporation
> srcversion: 939E0595E8A3C2E1BE94392
> 
> [8.754040] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA
> Driver: 8.07.00.38-k.
> [9.979523] qla2xxx [:27:00.0]-001b: : BAR 3 not enabled.
> [   10.201268] qla2xxx [:27:00.0]-001d: : Found an ISP2532 irq 110 iobase
> 0xacbf189b1000.
> [   10.407865] scsi host5: qla2xxx
> [   10.444281] qla2xxx: probe of :27:00.0 failed with error -22
> [   10.444519] qla2xxx [:27:00.1]-001b: : BAR 3 not enabled.
> [   10.444522] qla2xxx [:27:00.1]-001d: : Found an ISP2532 irq 110 iobase
> 0xacbf189b9000.
> [   10.645932] scsi host5: qla2xxx
> [   10.682233] qla2xxx: probe of :27:00.1 failed with error -22
> 
> Thanks
> Laurence
> 

I started bisecting this, cannot believe others have not bumped into this on 
4.10.
This is a generic QLE2562 and firmware is loaded by the driver so wondering why 
I am seeing this and other are not.
There is nothing special with the PCIE bus on this DL380G8.

Anyway during the bisect I got to a point where in the 4.10 commits I still saw 
the "BAR 3" message
but the probe worked.

[7.208237] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 
8.07.00.38-k.
[7.208492] qla2xxx [:27:00.0]-001b: : BAR 3 not enabled.

   see this above but probe did not fail

[7.208494] qla2xxx [:27:00.0]-001d: : Found an ISP2532 irq 97 iobase 
0xc02f98989000.
[7.414738] scsi host4: qla2xxx

[7.419267] qla2xxx [:27:00.0]-00fb:4: QLogic QLE2562 - PCI-Express Dual 
Channel 8Gb Fibre Channel HBA.
[7.419278] qla2xxx [:27:00.0]-00fc:4: ISP2532: PCIe (5.0GT/s x8) @ 
:27:00.0 hdma+ host#=4 fw=8.03.00 (90d5).
[7.419698] qla2xxx [:27:00.1]-001b: : BAR 3 not enabled.
[7.419701] qla2xxx [:27:00.1]-001d: : Found an ISP2532 irq 100 iobase 
0xc02f989b1000.
[7.625691] scsi host6: qla2xxx
[7.629218] qla2xxx [:27:00.1]-00fb:6: QLogic QLE2562 - PCI-Express Dual 
Channel 8Gb Fibre Channel HBA.
[7.629222] qla2xxx [:27:00.1]-00fc:6: ISP2532: PCIe (5.0GT/s x8) @ 
:27:00.1 hdma+ host#=6 

I marked that still as bad and am continuing, I have 9 builds to go.

Thanks
Laurence


4.10+ qla2xxx driver wont load for qla2xxx (ISP2532-based 8Gb) with BAR 3 error, work fine on 4.9

2017-03-12 Thread Laurence Oberman
Chad, Himanshu

Before I bisect or go chase changes, wanted to reach out because the driver 
seems to be the same version.
Perhaps this is a PCIE change in the kernel for 4.10 affecting the load.
Its the same targetLIO server I have been using for a long time with 4.9

27:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI 
Express HBA (rev 02)

With 4.9 I have no issues loading the driver for my targetLIO server. (DL380G8)

# modinfo qla2xxx | more
filename:   
/lib/modules/4.9.0.lobetcm+/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
firmware:   ql2500_fw.bin
version:8.07.00.38-k
license:GPL
description:QLogic Fibre Channel HBA Driver
author: QLogic Corporation
srcversion: 94A8431A85BFF854B97B02D

[8.906351] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 
8.07.00.38-k.
[   10.014052] qla2xxx [:27:00.0]-001d: : Found an ISP2532 irq 106 iobase 
0xadce989a1000.
[   10.455108] scsi host4: qla2xxx
[   10.460206] qla2xxx [:27:00.0]-00fb:4: QLogic QLE2562 - PCI-Express Dual 
Channel 8Gb Fibre Channel HBA.
[   10.460215] qla2xxx [:27:00.0]-00fc:4: ISP2532: PCIe (5.0GT/s x8) @ 
:27:00.0 hdma+ host#=4 fw=8.03.00 (90d5).
[   10.460545] qla2xxx [:27:00.1]-001d: : Found an ISP2532 irq 110 iobase 
0xadce989a9000.
[   10.662120] scsi host5: qla2xxx
[   11.007841] qla2xxx [:27:00.1]-00fb:5: QLogic QLE2562 - PCI-Express Dual 
Channel 8Gb Fibre Channel HBA.
[   11.007849] qla2xxx [:27:00.1]-00fc:5: ISP2532: PCIe (5.0GT/s x8) @ 
:27:00.1 hdma+ host#=5 fw=8.03.00 (90d5).

Rebooting on the same server with 4.10 fails to load 

Linux  4.10.0+ 
# modinfo qla2xxx | more
filename:   /lib/modules/4.10.0+/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
firmware:   ql2500_fw.bin
version:8.07.00.38-k
license:GPL
description:QLogic Fibre Channel HBA Driver
author: QLogic Corporation
srcversion: 939E0595E8A3C2E1BE94392

[8.754040] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 
8.07.00.38-k.
[9.979523] qla2xxx [:27:00.0]-001b: : BAR 3 not enabled.
[   10.201268] qla2xxx [:27:00.0]-001d: : Found an ISP2532 irq 110 iobase 
0xacbf189b1000.
[   10.407865] scsi host5: qla2xxx
[   10.444281] qla2xxx: probe of :27:00.0 failed with error -22
[   10.444519] qla2xxx [:27:00.1]-001b: : BAR 3 not enabled.
[   10.444522] qla2xxx [:27:00.1]-001d: : Found an ISP2532 irq 110 iobase 
0xacbf189b9000.
[   10.645932] scsi host5: qla2xxx
[   10.682233] qla2xxx: probe of :27:00.1 failed with error -22

Thanks
Laurence


Re: [PATCH v2] scsi_sysfs: fix hang when removing scsi device

2017-03-12 Thread Israel Rukshin

Hi Bart,

scsi_device_get() affect I/O because scsi_target_unblock() use it and calls to 
blk_start_queue().
terminate_rport_io() is called after scsi_target_unblock() and completes all 
the commands
including the SYNCHRONIZE CACHE command.

I applied your patch and you can see that QUEUE_FLAG_STOPPED is on.

[  342.485087] sd 7:0:0:0: Device offlined - not ready after error recovery
[  342.505738] scsi host10: ib_srp: Path record query failed
[  342.512023] sd 10:0:0:0: Device offlined - not ready after error recovery
[  342.589265] sd 7:0:0:0: __scsi_remove_device: device_busy = 0 device_blocked 
= 0
[  342.624110] sd 7:0:0:0: [sdc] Synchronizing SCSI cache
[  342.630263] sd 7:0:0:0: [sdc] Synchronize Cache(10) failed: Result: 
hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
[  342.649504] scsi 7:0:0:0: alua: Detached
[  342.769099] [ cut here ]
[  342.769107] WARNING: CPU: 10 PID: 317 at drivers/scsi/scsi_sysfs.c:1293 
__scsi_remove_device+0x131/0x140
[  342.769108] Modules linked in: nfsv3 ib_srp(-) dm_service_time 
scsi_transport_srp ib_uverbs ib_umad ib_ipoib ib_cm mlx4_ib ib_core 
rpcsec_gss_krb5 nfsv4 dns_resolver nfs netconsole fscache dm_mirror 
dm_region_hash dm_log sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp 
coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel pcbc aesni_intel crypto_simd joydev input_leds glue_helper 
ipmi_si iTCO_wdt pcspkr cryptd mei_me iTCO_vendor_support ipmi_devintf sg 
lpc_ich ipmi_msghandler mei i2c_i801 shpchp mfd_core ioatdma nfsd auth_rpcgss 
dm_multipath nfs_acl dm_mod lockd grace sunrpc ip_tables ext4 jbd2 mbcache 
sd_mod mgag200 drm_kms_helper syscopyarea isci sysfillrect sysimgblt ahci igb 
libsas fb_sys_fops libahci ttm scsi_transport_sas ptp pps_core crc32c_intel dca 
drm
[  342.769152]  i2c_algo_bit libata mlx4_core fjes [last unloaded: ib_srp]
[  342.769157] CPU: 10 PID: 317 Comm: kworker/10:1 Not tainted 4.11.0-rc1+ #97
[  342.769157] Hardware name: Supermicro X9DRFR/X9DRFR, BIOS 1.0a 09/11/2012
[  342.769163] Workqueue: srp_remove srp_remove_work [ib_srp]
[  342.769165] Call Trace:
[  342.769173]  dump_stack+0x63/0x90
[  342.769176]  __warn+0xcb/0xf0
[  342.769178]  warn_slowpath_null+0x1d/0x20
[  342.769180]  __scsi_remove_device+0x131/0x140
[  342.769182]  scsi_forget_host+0x60/0x70
[  342.769186]  scsi_remove_host+0x77/0x110
[  342.769189]  srp_remove_work+0x90/0x230 [ib_srp]
[  342.769192]  process_one_work+0x177/0x430
[  342.769193]  worker_thread+0x4e/0x4b0
[  342.769195]  kthread+0x101/0x140
[  342.769197]  ? process_one_work+0x430/0x430
[  342.769198]  ? kthread_create_on_node+0x60/0x60
[  342.769201]  ret_from_fork+0x2c/0x40
[  342.769202] ---[ end trace 1eef46ba7887fee3 ]---
[  342.769210] sd 10:0:0:0: __scsi_remove_device: device_busy = 0 
device_blocked = 0
[  343.020039] sd 10:0:0:0: [sde] Synchronizing SCSI cache
[  352.717659] scsi host10: ib_srp: Got failed path rec status -110

Israel.


On 3/9/2017 9:36 PM, Bart Van Assche wrote:

On Thu, 2017-03-09 at 18:37 +0200, Israel Rukshin wrote:

The bug reproduce when unloading srp module with one port down.
sd_shutdown() hangs when __scsi_remove_device() get scsi_device with
state SDEV_OFFLINE or SDEV_TRANSPORT_OFFLINE.
It hangs because sd_shutdown() is trying to send sync cache command
when the device is offline but with SDEV_CANCEL status.
The status was changed to SDEV_CANCEL by __scsi_remove_device()
before it calls to device_del().

The block layer timeout mechanism doesn't cause the SYNCHRONIZE CACHE
command to fail after the timeout expired because the request timer
wasn't started.
blk_peek_request() that is called from scsi_request_fn() didn't return
this request and therefore the request timer didn't start.

This commit doesn't accept new commands if the original state was offline.

The bug was revealed after commit cff549 ("scsi: proper state checking
and module refcount handling in scsi_device_get").
After this commit scsi_device_get() returns error if the device state
is SDEV_CANCEL.
This eventually leads SRP fast I/O failure timeout handler not to clean
the sync cache command because scsi_target_unblock() skip the canceled device.
If this timeout handler is set to infinity then the hang remains forever
also before commit cff549.

How could blk_peek_request() not return a request that has not yet been
started? How could a patch that changes scsi_device_get() affect I/O since
scsi_device_get() is not called from the I/O path? Anyway, could you try to
reproduce the hang with the patch below applied and see whether the output
produced by this patch helps to determine what is going on?

Thanks,

Bart.

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ba2286652ff6..855548ff4c4d 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3018,8 +3018,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
else
sdev->sdev_state = SDEV_CREATED;

[scsi] scsi: ufs: don't check unsigned type for a negative value

2017-03-12 Thread Tomas Winkler
Fix compilation warning

drivers/scsi/ufs/ufshcd.c:7645:13: warning: comparison of unsigned
expression < 0 is always false [-Wtype-limits]
if ((value < UFS_PM_LVL_0) || (value >= UFS_PM_LVL_MAX))

Signed-off-by: Tomas Winkler 
---
 drivers/scsi/ufs/ufshcd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 1359913bf840..e8c26e6e6237 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -7642,7 +7642,7 @@ static inline ssize_t ufshcd_pm_lvl_store(struct device 
*dev,
if (kstrtoul(buf, 0, ))
return -EINVAL;
 
-   if ((value < UFS_PM_LVL_0) || (value >= UFS_PM_LVL_MAX))
+   if (value >= UFS_PM_LVL_MAX)
return -EINVAL;
 
spin_lock_irqsave(hba->host->host_lock, flags);
-- 
2.9.3