Re: 4.10+ qla2xxx driver wont load for qla2xxx (ISP2532-based 8Gb) with BAR 3 error, work fine on 4.9
- Original Message - > From: "Laurence Oberman"> To: "Chad Dupuis" , "Himanshu Madhani" > > Cc: "Linux SCSI List" > Sent: Sunday, March 12, 2017 7:39:23 AM > Subject: 4.10+ qla2xxx driver wont load for qla2xxx (ISP2532-based 8Gb) with > BAR 3 error, work fine on 4.9 > > Chad, Himanshu > > Before I bisect or go chase changes, wanted to reach out because the driver > seems to be the same version. > Perhaps this is a PCIE change in the kernel for 4.10 affecting the load. > Its the same targetLIO server I have been using for a long time with 4.9 > > 27:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI > Express HBA (rev 02) > > With 4.9 I have no issues loading the driver for my targetLIO server. > (DL380G8) > > # modinfo qla2xxx | more > filename: > /lib/modules/4.9.0.lobetcm+/kernel/drivers/scsi/qla2xxx/qla2xxx.ko > firmware: ql2500_fw.bin > version:8.07.00.38-k > license:GPL > description:QLogic Fibre Channel HBA Driver > author: QLogic Corporation > srcversion: 94A8431A85BFF854B97B02D > > [8.906351] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA > Driver: 8.07.00.38-k. > [ 10.014052] qla2xxx [:27:00.0]-001d: : Found an ISP2532 irq 106 iobase > 0xadce989a1000. > [ 10.455108] scsi host4: qla2xxx > [ 10.460206] qla2xxx [:27:00.0]-00fb:4: QLogic QLE2562 - PCI-Express > Dual Channel 8Gb Fibre Channel HBA. > [ 10.460215] qla2xxx [:27:00.0]-00fc:4: ISP2532: PCIe (5.0GT/s x8) @ > :27:00.0 hdma+ host#=4 fw=8.03.00 (90d5). > [ 10.460545] qla2xxx [:27:00.1]-001d: : Found an ISP2532 irq 110 iobase > 0xadce989a9000. > [ 10.662120] scsi host5: qla2xxx > [ 11.007841] qla2xxx [:27:00.1]-00fb:5: QLogic QLE2562 - PCI-Express > Dual Channel 8Gb Fibre Channel HBA. > [ 11.007849] qla2xxx [:27:00.1]-00fc:5: ISP2532: PCIe (5.0GT/s x8) @ > :27:00.1 hdma+ host#=5 fw=8.03.00 (90d5). > > Rebooting on the same server with 4.10 fails to load > > Linux 4.10.0+ > # modinfo qla2xxx | more > filename: /lib/modules/4.10.0+/kernel/drivers/scsi/qla2xxx/qla2xxx.ko > firmware: ql2500_fw.bin > version:8.07.00.38-k > license:GPL > description:QLogic Fibre Channel HBA Driver > author: QLogic Corporation > srcversion: 939E0595E8A3C2E1BE94392 > > [8.754040] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA > Driver: 8.07.00.38-k. > [9.979523] qla2xxx [:27:00.0]-001b: : BAR 3 not enabled. > [ 10.201268] qla2xxx [:27:00.0]-001d: : Found an ISP2532 irq 110 iobase > 0xacbf189b1000. > [ 10.407865] scsi host5: qla2xxx > [ 10.444281] qla2xxx: probe of :27:00.0 failed with error -22 > [ 10.444519] qla2xxx [:27:00.1]-001b: : BAR 3 not enabled. > [ 10.444522] qla2xxx [:27:00.1]-001d: : Found an ISP2532 irq 110 iobase > 0xacbf189b9000. > [ 10.645932] scsi host5: qla2xxx > [ 10.682233] qla2xxx: probe of :27:00.1 failed with error -22 > > Thanks > Laurence > I started bisecting this, cannot believe others have not bumped into this on 4.10. This is a generic QLE2562 and firmware is loaded by the driver so wondering why I am seeing this and other are not. There is nothing special with the PCIE bus on this DL380G8. Anyway during the bisect I got to a point where in the 4.10 commits I still saw the "BAR 3" message but the probe worked. [7.208237] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 8.07.00.38-k. [7.208492] qla2xxx [:27:00.0]-001b: : BAR 3 not enabled. see this above but probe did not fail [7.208494] qla2xxx [:27:00.0]-001d: : Found an ISP2532 irq 97 iobase 0xc02f98989000. [7.414738] scsi host4: qla2xxx [7.419267] qla2xxx [:27:00.0]-00fb:4: QLogic QLE2562 - PCI-Express Dual Channel 8Gb Fibre Channel HBA. [7.419278] qla2xxx [:27:00.0]-00fc:4: ISP2532: PCIe (5.0GT/s x8) @ :27:00.0 hdma+ host#=4 fw=8.03.00 (90d5). [7.419698] qla2xxx [:27:00.1]-001b: : BAR 3 not enabled. [7.419701] qla2xxx [:27:00.1]-001d: : Found an ISP2532 irq 100 iobase 0xc02f989b1000. [7.625691] scsi host6: qla2xxx [7.629218] qla2xxx [:27:00.1]-00fb:6: QLogic QLE2562 - PCI-Express Dual Channel 8Gb Fibre Channel HBA. [7.629222] qla2xxx [:27:00.1]-00fc:6: ISP2532: PCIe (5.0GT/s x8) @ :27:00.1 hdma+ host#=6 I marked that still as bad and am continuing, I have 9 builds to go. Thanks Laurence
4.10+ qla2xxx driver wont load for qla2xxx (ISP2532-based 8Gb) with BAR 3 error, work fine on 4.9
Chad, Himanshu Before I bisect or go chase changes, wanted to reach out because the driver seems to be the same version. Perhaps this is a PCIE change in the kernel for 4.10 affecting the load. Its the same targetLIO server I have been using for a long time with 4.9 27:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02) With 4.9 I have no issues loading the driver for my targetLIO server. (DL380G8) # modinfo qla2xxx | more filename: /lib/modules/4.9.0.lobetcm+/kernel/drivers/scsi/qla2xxx/qla2xxx.ko firmware: ql2500_fw.bin version:8.07.00.38-k license:GPL description:QLogic Fibre Channel HBA Driver author: QLogic Corporation srcversion: 94A8431A85BFF854B97B02D [8.906351] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 8.07.00.38-k. [ 10.014052] qla2xxx [:27:00.0]-001d: : Found an ISP2532 irq 106 iobase 0xadce989a1000. [ 10.455108] scsi host4: qla2xxx [ 10.460206] qla2xxx [:27:00.0]-00fb:4: QLogic QLE2562 - PCI-Express Dual Channel 8Gb Fibre Channel HBA. [ 10.460215] qla2xxx [:27:00.0]-00fc:4: ISP2532: PCIe (5.0GT/s x8) @ :27:00.0 hdma+ host#=4 fw=8.03.00 (90d5). [ 10.460545] qla2xxx [:27:00.1]-001d: : Found an ISP2532 irq 110 iobase 0xadce989a9000. [ 10.662120] scsi host5: qla2xxx [ 11.007841] qla2xxx [:27:00.1]-00fb:5: QLogic QLE2562 - PCI-Express Dual Channel 8Gb Fibre Channel HBA. [ 11.007849] qla2xxx [:27:00.1]-00fc:5: ISP2532: PCIe (5.0GT/s x8) @ :27:00.1 hdma+ host#=5 fw=8.03.00 (90d5). Rebooting on the same server with 4.10 fails to load Linux 4.10.0+ # modinfo qla2xxx | more filename: /lib/modules/4.10.0+/kernel/drivers/scsi/qla2xxx/qla2xxx.ko firmware: ql2500_fw.bin version:8.07.00.38-k license:GPL description:QLogic Fibre Channel HBA Driver author: QLogic Corporation srcversion: 939E0595E8A3C2E1BE94392 [8.754040] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 8.07.00.38-k. [9.979523] qla2xxx [:27:00.0]-001b: : BAR 3 not enabled. [ 10.201268] qla2xxx [:27:00.0]-001d: : Found an ISP2532 irq 110 iobase 0xacbf189b1000. [ 10.407865] scsi host5: qla2xxx [ 10.444281] qla2xxx: probe of :27:00.0 failed with error -22 [ 10.444519] qla2xxx [:27:00.1]-001b: : BAR 3 not enabled. [ 10.444522] qla2xxx [:27:00.1]-001d: : Found an ISP2532 irq 110 iobase 0xacbf189b9000. [ 10.645932] scsi host5: qla2xxx [ 10.682233] qla2xxx: probe of :27:00.1 failed with error -22 Thanks Laurence
Re: [PATCH v2] scsi_sysfs: fix hang when removing scsi device
Hi Bart, scsi_device_get() affect I/O because scsi_target_unblock() use it and calls to blk_start_queue(). terminate_rport_io() is called after scsi_target_unblock() and completes all the commands including the SYNCHRONIZE CACHE command. I applied your patch and you can see that QUEUE_FLAG_STOPPED is on. [ 342.485087] sd 7:0:0:0: Device offlined - not ready after error recovery [ 342.505738] scsi host10: ib_srp: Path record query failed [ 342.512023] sd 10:0:0:0: Device offlined - not ready after error recovery [ 342.589265] sd 7:0:0:0: __scsi_remove_device: device_busy = 0 device_blocked = 0 [ 342.624110] sd 7:0:0:0: [sdc] Synchronizing SCSI cache [ 342.630263] sd 7:0:0:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK [ 342.649504] scsi 7:0:0:0: alua: Detached [ 342.769099] [ cut here ] [ 342.769107] WARNING: CPU: 10 PID: 317 at drivers/scsi/scsi_sysfs.c:1293 __scsi_remove_device+0x131/0x140 [ 342.769108] Modules linked in: nfsv3 ib_srp(-) dm_service_time scsi_transport_srp ib_uverbs ib_umad ib_ipoib ib_cm mlx4_ib ib_core rpcsec_gss_krb5 nfsv4 dns_resolver nfs netconsole fscache dm_mirror dm_region_hash dm_log sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd joydev input_leds glue_helper ipmi_si iTCO_wdt pcspkr cryptd mei_me iTCO_vendor_support ipmi_devintf sg lpc_ich ipmi_msghandler mei i2c_i801 shpchp mfd_core ioatdma nfsd auth_rpcgss dm_multipath nfs_acl dm_mod lockd grace sunrpc ip_tables ext4 jbd2 mbcache sd_mod mgag200 drm_kms_helper syscopyarea isci sysfillrect sysimgblt ahci igb libsas fb_sys_fops libahci ttm scsi_transport_sas ptp pps_core crc32c_intel dca drm [ 342.769152] i2c_algo_bit libata mlx4_core fjes [last unloaded: ib_srp] [ 342.769157] CPU: 10 PID: 317 Comm: kworker/10:1 Not tainted 4.11.0-rc1+ #97 [ 342.769157] Hardware name: Supermicro X9DRFR/X9DRFR, BIOS 1.0a 09/11/2012 [ 342.769163] Workqueue: srp_remove srp_remove_work [ib_srp] [ 342.769165] Call Trace: [ 342.769173] dump_stack+0x63/0x90 [ 342.769176] __warn+0xcb/0xf0 [ 342.769178] warn_slowpath_null+0x1d/0x20 [ 342.769180] __scsi_remove_device+0x131/0x140 [ 342.769182] scsi_forget_host+0x60/0x70 [ 342.769186] scsi_remove_host+0x77/0x110 [ 342.769189] srp_remove_work+0x90/0x230 [ib_srp] [ 342.769192] process_one_work+0x177/0x430 [ 342.769193] worker_thread+0x4e/0x4b0 [ 342.769195] kthread+0x101/0x140 [ 342.769197] ? process_one_work+0x430/0x430 [ 342.769198] ? kthread_create_on_node+0x60/0x60 [ 342.769201] ret_from_fork+0x2c/0x40 [ 342.769202] ---[ end trace 1eef46ba7887fee3 ]--- [ 342.769210] sd 10:0:0:0: __scsi_remove_device: device_busy = 0 device_blocked = 0 [ 343.020039] sd 10:0:0:0: [sde] Synchronizing SCSI cache [ 352.717659] scsi host10: ib_srp: Got failed path rec status -110 Israel. On 3/9/2017 9:36 PM, Bart Van Assche wrote: On Thu, 2017-03-09 at 18:37 +0200, Israel Rukshin wrote: The bug reproduce when unloading srp module with one port down. sd_shutdown() hangs when __scsi_remove_device() get scsi_device with state SDEV_OFFLINE or SDEV_TRANSPORT_OFFLINE. It hangs because sd_shutdown() is trying to send sync cache command when the device is offline but with SDEV_CANCEL status. The status was changed to SDEV_CANCEL by __scsi_remove_device() before it calls to device_del(). The block layer timeout mechanism doesn't cause the SYNCHRONIZE CACHE command to fail after the timeout expired because the request timer wasn't started. blk_peek_request() that is called from scsi_request_fn() didn't return this request and therefore the request timer didn't start. This commit doesn't accept new commands if the original state was offline. The bug was revealed after commit cff549 ("scsi: proper state checking and module refcount handling in scsi_device_get"). After this commit scsi_device_get() returns error if the device state is SDEV_CANCEL. This eventually leads SRP fast I/O failure timeout handler not to clean the sync cache command because scsi_target_unblock() skip the canceled device. If this timeout handler is set to infinity then the hang remains forever also before commit cff549. How could blk_peek_request() not return a request that has not yet been started? How could a patch that changes scsi_device_get() affect I/O since scsi_device_get() is not called from the I/O path? Anyway, could you try to reproduce the hang with the patch below applied and see whether the output produced by this patch helps to determine what is going on? Thanks, Bart. diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index ba2286652ff6..855548ff4c4d 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -3018,8 +3018,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev, else sdev->sdev_state = SDEV_CREATED;
[scsi] scsi: ufs: don't check unsigned type for a negative value
Fix compilation warning drivers/scsi/ufs/ufshcd.c:7645:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] if ((value < UFS_PM_LVL_0) || (value >= UFS_PM_LVL_MAX)) Signed-off-by: Tomas Winkler--- drivers/scsi/ufs/ufshcd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 1359913bf840..e8c26e6e6237 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -7642,7 +7642,7 @@ static inline ssize_t ufshcd_pm_lvl_store(struct device *dev, if (kstrtoul(buf, 0, )) return -EINVAL; - if ((value < UFS_PM_LVL_0) || (value >= UFS_PM_LVL_MAX)) + if (value >= UFS_PM_LVL_MAX) return -EINVAL; spin_lock_irqsave(hba->host->host_lock, flags); -- 2.9.3