[Bug 188061] On quad port QLE2564 can't add in target only 2 ports
https://bugzilla.kernel.org/show_bug.cgi?id=188061 --- Comment #5 from Anthony (anthony.blood...@gmail.com) --- I've installed another 4-port card and got same issue - port names have variable part at highest bits. On 2-port card port names have variable part at lower bits: port_name = "0x2124ff3bcb3e" port_name = "0x2124ff3bcb3f" I'm trying to change a port name in NVRAM with FLASUTIL.EXE with no luck. -- You are receiving this mail because: You are watching the assignee of the bug.
[PATCH] qla2xxx: Fix ql_dump_buffer
Recent printk changes for KERN_CONT cause this logging to be defectively emitted on multiple lines. Fix it. Also reduces object size a trivial amount. $ size drivers/scsi/qla2xxx/qla_dbg.o* textdata bss dec hex filename 39125 0 0 3912598d5 drivers/scsi/qla2xxx/qla_dbg.o.new 39164 0 0 3916498fc drivers/scsi/qla2xxx/qla_dbg.o.old Signed-off-by: Joe Perches--- drivers/scsi/qla2xxx/qla_dbg.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_dbg.c b/drivers/scsi/qla2xxx/qla_dbg.c index 21d9fb7fc887..51b4179469d1 100644 --- a/drivers/scsi/qla2xxx/qla_dbg.c +++ b/drivers/scsi/qla2xxx/qla_dbg.c @@ -2707,13 +2707,9 @@ ql_dump_buffer(uint32_t level, scsi_qla_host_t *vha, int32_t id, "%-+5d 0 1 2 3 4 5 6 7 8 9 A B C D E F\n", size); ql_dbg(level, vha, id, "- ---\n"); - for (cnt = 0; cnt < size; cnt++, buf++) { - if (cnt % 16 == 0) - ql_dbg(level, vha, id, "%04x:", cnt & ~0xFU); - printk(" %02x", *buf); - if (cnt % 16 == 15) - printk("\n"); + for (cnt = 0; cnt < size; cnt += 16) { + ql_dbg(level, vha, id, "%04x: ", cnt); + print_hex_dump(KERN_CONT, "", DUMP_PREFIX_NONE, 16, 1, + buf + cnt, min(16U, size - cnt), false); } - if (cnt % 16 != 0) - printk("\n"); } -- 2.10.0.rc2.1.g053435c
[PATCH] aacraid: Fix typo in blink status
The return status of the adapter check on KERNEL_PANIC is supposed to be the upper 16 bits of the OMR status register. Fixes: c421530bf848604e (scsi: aacraid: Reorder Adpater status check) Reported-by: Dan CarpenterSigned-off-by: Raghava Aditya Renukunta --- drivers/scsi/aacraid/src.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/aacraid/src.c b/drivers/scsi/aacraid/src.c index 2e5338d..7b0410e 100644 --- a/drivers/scsi/aacraid/src.c +++ b/drivers/scsi/aacraid/src.c @@ -468,7 +468,7 @@ static int aac_src_check_health(struct aac_dev *dev) return -1; err_blink: - return (status > 16) & 0xFF; + return (status >> 16) & 0xFF; } static inline u32 aac_get_vector(struct aac_dev *dev) -- 1.8.3.1
Re: SCSI regression in 4.11
On March 2, 2017 10:23:24 AM PST, Stephen Hemmingerwrote: >On Thu, 2 Mar 2017 14:25:14 +0100 >Hannes Reinecke wrote: > >> On 03/02/2017 02:40 AM, Stephen Hemminger wrote: >> > On Thu, 2 Mar 2017 01:56:15 +0100 >> > Christoph Hellwig wrote: >> > >> >> On Thu, Mar 02, 2017 at 01:01:35AM +0100, Christoph Hellwig wrote: > >> >>> On Wed, Mar 01, 2017 at 07:54:12AM -0800, Stephen Hemminger >wrote: >> > > > http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4 > >> >> It appears that is already in the code I am testing in >linux-next... >> >>> >> >>> It's in -next now, but it wasn't at the time you reported the >bug. >> >>> >> >>> And it would sortof explain the bug if the INQUIRY data is >correct >> >>> in the scatterlist, but we ignore it, given that scsi_probe_lun >> >>> ignores the result based on sense data. >> >>> >> >>> Can you check what happens with the horrible hack below: >> >> >> >> Strike that - we're checking result later, so this can't be the >case. >> >> >> >> Now the other interesting thing is the memset in __scsi_exectute, >> >> which looks very suspicious. Try the following please: >> >> >> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c >> >> index 3e32dc954c3c..22f4fb550561 100644 >> >> --- a/drivers/scsi/scsi_lib.c >> >> +++ b/drivers/scsi/scsi_lib.c >> >> @@ -253,7 +253,8 @@ static int __scsi_execute(struct scsi_device >*sdev, const unsigned char *cmd, >> >>* and prevent security leaks by zeroing out the excess data. >> >>*/ >> >> if (unlikely(rq->resid_len > 0 && rq->resid_len <= bufflen)) >> >> - memset(buffer + (bufflen - rq->resid_len), 0, rq->resid_len); >> >> +// memset(buffer + (bufflen - rq->resid_len), 0, >> >> rq->resid_len); >> >> + printk_ratelimited("%s: got resid %d\n", __func__, >rq->resid_len); >> >> >> >> if (resid) >> >> *resid = rq->resid_len; >> > >> > >> > Still fails but does print resid on some of the later INQUIRY >commands (not the initial one). >> > >> Can you test what happens if you blank out the storvsc_drv >workaround: >> >> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c >> index 585e54f..c36f42d 100644 >> --- a/drivers/scsi/storvsc_drv.c >> +++ b/drivers/scsi/storvsc_drv.c >> @@ -1060,13 +1060,13 @@ static void storvsc_on_io_completion(struct >> storvsc_device *stor_device, >> * We do this so we can distinguish truly fatal failues >> * (srb status == 0x4) and off-line the device in that case. >> */ >> - >> +#if 0 >> if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) || >> (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) { >> vstor_packet->vm_srb.scsi_status = 0; >> vstor_packet->vm_srb.srb_status = >SRB_STATUS_SUCCESS; >> } >> - >> +#endif >> >> /* Copy over the status...etc */ >> stor_pkt->vm_srb.scsi_status = >vstor_packet->vm_srb.scsi_status; >> >> It might thappen that we're fail to interpret the 'Device not >present' >> status correctly (which will happen for non-connected DVDs) causing >the >> SCSI stack to make incorrect decisions later on. >> >> Cheers, >> >> Hannes > >There are several oddities about the host SCSI interface that I see: > 1. The host bus seems to report up to 6 devices even though only 2 are > present (Disk and CDROM). >2. The CDROM emulation doesn't report the same status as a real device. > 3. The host emulation of SCSI doesn't support all the page codes which > is why there is the hack. > >But as James said, these don't appear to be related to the failure >because >the code worked before and only in post 4.11 merege is there a problem. Your wait for the hang trace is the most suggestive. It says we're waiting for a partition read to the spurious device. Previously this would have failed or timed out, so this seems to be the root cause. James -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
RE: [bug report] scsi: aacraid: Reorder Adapter status check
> -Original Message- > From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi- > ow...@vger.kernel.org] On Behalf Of Dan Carpenter > Sent: Monday, February 27, 2017 8:32 AM > To: Raghava Aditya Renukunta > Cc: linux-scsi@vger.kernel.org > Subject: [bug report] scsi: aacraid: Reorder Adapter status check > > > The patch c421530bf848: "scsi: aacraid: Reorder Adapter status check" > from Feb 16, 2017, leads to the following static checker warning: > > drivers/scsi/aacraid/src.c:471 aac_src_check_health() > warn: was shift intended here '(status > 16)' > > drivers/scsi/aacraid/src.c >464 */ >465 return 0; >466 >467 err_out: >468 return -1; >469 >470 err_blink: >471 return (status > 16) & 0xFF; > ^^^ Issue #1: This is obviously a > typo. Agreed, will submit a correction ... should be >> > >472 } > > Issue #2: The caller checks for if the return is greater than 2. It >never is. We can remove this dead code. That would be the blink led returned from the controller which would be greater than 2 > > Issue #3: The caller passes "bled" to aac_send_iop_reset() which >ignores it. What's up with that? Either it's a bug or we >should delete that dead code. When we have gotten all caught up, we will be printing the blink led as part of the iop_reset Thanks, -Dave > > regards, > dan carpenter
[GIT PULL] target updates for v4.11-rc1
Hello Linus, Here are the outstanding target pending changes for v4.11-rc1. Please go ahead and pull from: git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git for-next The highlights this round include: - Enable dual mode (initiator + target) qla2xxx operation. (Quinn + Himanshu) - Add a framework for qla2xxx async fabric discovery. (Quinn + Himanshu) - Enable iscsi PDU DDP completion offload in cxgbit/T6 NICs. (Varun) - Fix target-core handling of aborted failed commands. (Bart) - Fix a long standing target-core issue NULL pointer dereference with active I/O LUN shutdown. (Rob Millner + Bryant + nab) Things are shaping up to be a busy cycle for v4.12 with a new fabric driver (efct) in flight, and a number of other patches on the list being discussed. For post v4.11-rc1 fixes, there is a qla2xxx series that missed -rc1 that will be sent your way soon, along with a issue reported by Bart related to ib_srpt + LUN_RESET that we are still going back and forth on how the bug-fix should work. Beyond that, the number of folks with dedicated Q/A resources for upstream target code is steadily increasing, along with the number of production customers using the mainstream in-tree fabric drivers. Thank you, --nab Bart Van Assche (19): qla2xxx: Avoid using variable-length arrays target/cxgbit: Fix endianness annotations target/tcm_fc: Remove a set-but-not-used variable target/iscsi: Fix indentation in iscsi_target_start_negotiation() target/iscsi: Fix spelling of "perform" target/iscsi: Fix spelling of "reallegiance" target/iscsi: Introduce a helper function for TMF translation target/iscsi: Fix iSCSI task reassignment handling target: Remove se_tmr_req.tmr_lun target: Make core_tmr_abort_task() consider all commands target: Correct transport_wait_for_tasks() documentation target: Stop execution if CMD_T_STOP has been set target: Remove an overly chatty debug message target: Inline transport_cmd_check_stop() target: Move session check from target_put_sess_cmd() into target_release_cmd_kref() target: Remove command flag CMD_T_BUSY target: Remove command flag CMD_T_DEV_ACTIVE target: Fix handling of aborted failed commands target: Delete tmr from list before processing Dmitry V. Levin (1): uapi: fix linux/target_core_user.h userspace compilation errors Himanshu Madhani (2): qla2xxx: Remove SRR code qla2xxx: Remove unused reverse_ini_mode Joe Carnuccio (1): qla2xxx: Simplify usage of SRB structure in driver Mike Christie (1): target: export protocol identifier Nicholas Bellinger (3): target: Fix NULL dereference during LUN lookup + active I/O shutdown iscsi-target: Fix early login failure statistics misses target: Add counters for ABORT_TASK success + failure Quinn Tran (10): qla2xxx: Remove direct access of scsi_status field in se_cmd qla2xxx: Cleanup TMF code translation from qla_target qla2xxx: Make trace flags more readable qla2xxx: Fix wrong argument in sp done callback qla2xxx: Use d_id instead of s_id for more clarity qla2xxx: Track I-T nexus as single fc_port struct qla2xxx: Add framework for async fabric discovery qla2xxx: Add Dual mode support in the driver qla2xxx: Improve RSCN handling in driver qla2xxx: Fix a warning reported by the "smatch" static checker Varun Prakash (7): target/cxgbit: Use T6 specific macro to set the force bit target/iscsi: split iscsit_check_dataout_hdr() target/cxgbit: use cxgb4_tp_smt_idx() to get smt idx target/cxgbit: Use T6 specific macros to get ETH/IP hdr len target/cxgbit: Enable DDP for T6 only if data sequence and pdu are in order target/cxgbit: add T6 iSCSI DDP completion feature target/iscsi: Fix unsolicited data seq_end_offset calculation drivers/net/ethernet/chelsio/cxgb4/t4_msg.h|4 + drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.h |2 +- drivers/scsi/qla2xxx/qla_attr.c|3 + drivers/scsi/qla2xxx/qla_bsg.c | 23 +- drivers/scsi/qla2xxx/qla_def.h | 306 ++- drivers/scsi/qla2xxx/qla_dfs.c | 11 +- drivers/scsi/qla2xxx/qla_fw.h | 106 +- drivers/scsi/qla2xxx/qla_gbl.h | 72 +- drivers/scsi/qla2xxx/qla_gs.c | 726 +- drivers/scsi/qla2xxx/qla_init.c| 1612 + drivers/scsi/qla2xxx/qla_inline.h | 18 +- drivers/scsi/qla2xxx/qla_iocb.c| 167 +- drivers/scsi/qla2xxx/qla_isr.c | 317 ++- drivers/scsi/qla2xxx/qla_mbx.c | 232 +- drivers/scsi/qla2xxx/qla_mr.c | 48 +- drivers/scsi/qla2xxx/qla_os.c | 330 ++- drivers/scsi/qla2xxx/qla_target.c | 2392 +--- drivers/scsi/qla2xxx/qla_target.h | 252 +-- drivers/scsi/qla2xxx/tcm_qla2xxx.c | 256 ++-
[bug report] scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.
Hello Dupuis, Chad, The patch 61d8658b4a43: "scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework." from Feb 15, 2017, leads to the following static checker warning: drivers/scsi/qedf/qedf_main.c:1168 qedf_rport_event_handler() warn: 'rval' can be either negative or positive drivers/scsi/qedf/qedf_main.c 1005 static int qedf_offload_connection(struct qedf_ctx *qedf, 1006 struct qedf_rport *fcport) 1007 { 1008 struct qed_fcoe_params_offload conn_info; 1009 u32 port_id; 1010 u8 lport_src_id[3]; 1011 int rval; 1012 uint16_t total_sqe = (fcport->sq_mem_size / sizeof(struct fcoe_wqe)); 1013 1014 QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_CONN, "Offloading connection " 1015 "portid=%06x.\n", fcport->rdata->ids.port_id); 1016 rval = qed_ops->acquire_conn(qedf->cdev, >handle, 1017 >fw_cid, >p_doorbell); 1018 if (rval) { 1019 QEDF_WARN(&(qedf->dbg_ctx), "Could not acquire connection " 1020 "for portid=%06x.\n", fcport->rdata->ids.port_id); 1021 rval = 1; /* For some reason qed returns 0 on failure here */ What kind of an error is 1 supposed to indicate? This needs to be not a magic number and commented. 1022 goto out; 1023 } 1024 1025 QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_CONN, "portid=%06x " 1026 "fw_cid=%08x handle=%d.\n", fcport->rdata->ids.port_id, 1027 fcport->fw_cid, fcport->handle); 1028 1029 memset(_info, 0, sizeof(struct qed_fcoe_params_offload)); 1030 regards, dan carpenter
Re: [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run
Hej Hannes, On Wed, Mar 01, 2017 at 10:15:15AM +0100, Hannes Reinecke wrote: > The current medium access timeout counter will be increased for > each command, so if there are enough failed commands we'll hit > the medium access timeout for even a single failure. > Fix this by making the timeout per EH run, ie the counter will > only be increased once per device and EH run. > > Cc: Ewan Milne> Cc: Lawrence Obermann > Cc: Benjamin Block > Cc: Steffen Maier > Signed-off-by: Hannes Reinecke Steffen already suggested it, It would be nice to have a stable-tag here. This already caused multiple real-world false-positive outages, I think that qualifies for stable :) > --- > drivers/scsi/scsi_error.c | 16 +++- > drivers/scsi/sd.c | 21 + > drivers/scsi/sd.h | 1 + > include/scsi/scsi_driver.h | 2 +- > 4 files changed, 34 insertions(+), 6 deletions(-) > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index f2cafae..cec439c 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -58,6 +58,7 @@ > static int scsi_eh_try_stu(struct scsi_cmnd *scmd); > static int scsi_try_to_abort_cmd(struct scsi_host_template *, >struct scsi_cmnd *); > +static int scsi_eh_reset(struct scsi_cmnd *scmd); > > /* called with shost->host_lock held */ > void scsi_eh_wakeup(struct Scsi_Host *shost) > @@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) > if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) > eh_flag &= ~SCSI_EH_CANCEL_CMD; > scmd->eh_eflags |= eh_flag; > + scsi_eh_reset(scmd); > list_add_tail(>eh_entry, >eh_cmd_q); > shost->host_failed++; > scsi_eh_wakeup(shost); > @@ -1107,7 +1109,19 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int > rtn) > if (!blk_rq_is_passthrough(scmd->request)) { > struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd); > if (sdrv->eh_action) > - rtn = sdrv->eh_action(scmd, rtn); > + rtn = sdrv->eh_action(scmd, rtn, false); > + } > + return rtn; > +} > + > +static int scsi_eh_reset(struct scsi_cmnd *scmd) > +{ > + int rtn = SUCCESS; > + > + if (!blk_rq_is_passthrough(scmd->request)) { > + struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd); > + if (sdrv->eh_action) > + rtn = sdrv->eh_action(scmd, rtn, true); > } > return rtn; > } > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index c7839f6..c794686 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -115,7 +115,7 @@ > static int sd_init_command(struct scsi_cmnd *SCpnt); > static void sd_uninit_command(struct scsi_cmnd *SCpnt); > static int sd_done(struct scsi_cmnd *); > -static int sd_eh_action(struct scsi_cmnd *, int); > +static int sd_eh_action(struct scsi_cmnd *, int, bool); > static void sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer); > static void scsi_disk_release(struct device *cdev); > static void sd_print_sense_hdr(struct scsi_disk *, struct scsi_sense_hdr *); > @@ -1689,18 +1689,28 @@ static int sd_pr_clear(struct block_device *bdev, u64 > key) > * sd_eh_action - error handling callback > * @scmd: sd-issued command that has failed > * @eh_disp: The recovery disposition suggested by the midlayer > + * @reset: Reset medium access counter > * > * This function is called by the SCSI midlayer upon completion of an > * error test command (currently TEST UNIT READY). The result of sending > * the eh command is passed in eh_disp. We're looking for devices that > * fail medium access commands but are OK with non access commands like > * test unit ready (so wrongly see the device as having a successful > - * recovery) > + * recovery). > + * We have to be careful to count a medium access failure only once > + * per SCSI EH run; there might be several timed out commands which > + * will cause the 'max_medium_access_timeouts' counter to trigger > + * after the first SCSI EH run already and set the device to offline. > **/ > -static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp) > +static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp, bool reset) > { > struct scsi_disk *sdkp = scsi_disk(scmd->request->rq_disk); > > + if (reset) { > + /* New SCSI EH run, reset gate variable */ > + sdkp->medium_access_reset = 0; > + return eh_disp; > + } > if (!scsi_device_online(scmd->device) || > !scsi_medium_access_command(scmd) || > host_byte(scmd->result) != DID_TIME_OUT || > @@ -1714,7 +1724,10 @@ static int sd_eh_action(struct scsi_cmnd *scmd, int > eh_disp) >* process of recovering or has it
Re: [ANNOUNCE]: Broadcom (Emulex) FC Target driver - efct
Hi James & Co, Adding target-devel and Sebastian CC' On Mon, 2017-02-27 at 15:28 -0800, James Smart wrote: > I'd like to announce the availability of the Broadcom (Emulex) FC Target > driver - efct. > This driver has been part of the Emulex OneCore Storage SDK tool kit for > Emulex > SLI-4 adapters. The SLI-4 adapters support 16Gb/s and higher adapters. > Although this > kit has supported FCoE in the past, it is currently limited to FC support. > > This driver provides the following: > - Target mode operation: >- Functional with LIO-based interfaces Glad to see the upstream push for this after all these years. :) >- Extensive use of hw offloads such as auto-xfer_rdy, auto-rsp, cmd > cpu spreading >- High login mode - thousands of logins >- T-10 DIF/PI support (inline and separate) >- NPIV support > - Concurrent Initiator support if needed > - Discovery engine has F_Port and fabric services emulation. > - Extended mgmt interfaces: >- firmware dump api, including dump to host memory for faster dumps >- Healthcheck operations and watchdogs > - Extended driver behaviors such as: >- polled mode operation >- multi-queue: cpu, roundrobin, or priority (but not tied to scsi-mq) >- long chained sgl's >- extensive internal logging and statistics >- Tuning parameters on modes and resource allocation to different > features > > Broadcom is looking to upstream this driver and would like review and > feedback. > The driver may be found at the following git repository: > g...@gitlab.com:jsmart/efct-Emulex_FC_Target.git > Can we get the patch series posted to linux-scsi and target-devel to start giving some initial feedback..? > > Some of the key questions we have are with lpfc : > 1) Coexistence vs integration > Currently, the efct driver maps to a different set of PCI ids than lpfc. > It's very clear there's an overlap with lpfc, both on SLI-4 hw as well > as initiator support. > Although target mode support can be simplistically added to lpfc, what > we've found is > that doing so means a lot of tradeoffs. Some of the target mode > features, when enabled, > impact the initiator support and how it would operate. > I don't really have much preference either way. > 2) SLI-3 support > lpfc provides SLI-3 support so that all FC adapters are supported, > including the older ones. > The form of the driver, based on its history, is SLI-3 with SLI-3 > adapted to SLI-4 at the point > it hits the hardware. efct does not support SLI-3. AFAIK I think Sebastian was using SLI-3, so he might have some comments here. Since he's been the main person using the original tcm_lpfc code from way back when, maybe it would be a good idea to send him a couple of SLI-4 HBAs to help with the upstreaming of efct..? > > 3) complexity of configuration knobs caused by the kitchen-sink of > features in lpfc ? > we are pushing the limit on needing per-instance attributes rather than > global module > parameters. This is exactly what /sys/kernel/config/target/efct/$WWPN/tpgt_1/attribute/ is intended for.
Re: SCSI regression in 4.11
On March 2, 2017 11:05:05 AM PST, Stephen Hemmingerwrote: >On Thu, 02 Mar 2017 10:36:17 -0800 >James Bottomley wrote: > >> On March 2, 2017 10:23:24 AM PST, Stephen Hemminger > wrote: >> >On Thu, 2 Mar 2017 14:25:14 +0100 >> >Hannes Reinecke wrote: >> > >> >> On 03/02/2017 02:40 AM, Stephen Hemminger wrote: >> >> > On Thu, 2 Mar 2017 01:56:15 +0100 >> >> > Christoph Hellwig wrote: >> >> > >> >> >> On Thu, Mar 02, 2017 at 01:01:35AM +0100, Christoph Hellwig >wrote: >> > >> >> >>> On Wed, Mar 01, 2017 at 07:54:12AM -0800, Stephen Hemminger >> >wrote: >> >> > >> >> >> http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4 >> > >> >> >> >> It appears that is already in the code I am testing in >> >linux-next... >> >> >>> >> >> >>> It's in -next now, but it wasn't at the time you reported the > >> >bug. >> >> >>> >> >> >>> And it would sortof explain the bug if the INQUIRY data is >> >correct >> >> >>> in the scatterlist, but we ignore it, given that >scsi_probe_lun >> >> >>> ignores the result based on sense data. >> >> >>> >> >> >>> Can you check what happens with the horrible hack below: >> >> >> >> >> >> Strike that - we're checking result later, so this can't be the > >> >case. >> >> >> >> >> >> Now the other interesting thing is the memset in >__scsi_exectute, >> >> >> which looks very suspicious. Try the following please: >> >> >> >> >> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c >> >> >> index 3e32dc954c3c..22f4fb550561 100644 >> >> >> --- a/drivers/scsi/scsi_lib.c >> >> >> +++ b/drivers/scsi/scsi_lib.c >> >> >> @@ -253,7 +253,8 @@ static int __scsi_execute(struct >scsi_device >> >*sdev, const unsigned char *cmd, >> >> >> * and prevent security leaks by zeroing out the excess data. >> >> >> */ >> >> >>if (unlikely(rq->resid_len > 0 && rq->resid_len <= bufflen)) >> >> >> - memset(buffer + (bufflen - rq->resid_len), 0, >rq->resid_len); >> >> >> +//memset(buffer + (bufflen - rq->resid_len), 0, >rq->resid_len); >> >> >> + printk_ratelimited("%s: got resid %d\n", __func__, >> >rq->resid_len); >> >> >> >> >> >>if (resid) >> >> >>*resid = rq->resid_len; >> >> > >> >> > >> >> > Still fails but does print resid on some of the later INQUIRY >> >commands (not the initial one). >> >> > >> >> Can you test what happens if you blank out the storvsc_drv >> >workaround: >> >> >> >> diff --git a/drivers/scsi/storvsc_drv.c >b/drivers/scsi/storvsc_drv.c >> >> index 585e54f..c36f42d 100644 >> >> --- a/drivers/scsi/storvsc_drv.c >> >> +++ b/drivers/scsi/storvsc_drv.c >> >> @@ -1060,13 +1060,13 @@ static void >storvsc_on_io_completion(struct >> >> storvsc_device *stor_device, >> >> * We do this so we can distinguish truly fatal failues >> >> * (srb status == 0x4) and off-line the device in that >case. >> >> */ >> >> - >> >> +#if 0 >> >> if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) || >> >> (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) { >> >> vstor_packet->vm_srb.scsi_status = 0; >> >> vstor_packet->vm_srb.srb_status = >> >SRB_STATUS_SUCCESS; >> >> } >> >> - >> >> +#endif >> >> >> >> /* Copy over the status...etc */ >> >> stor_pkt->vm_srb.scsi_status = >> >vstor_packet->vm_srb.scsi_status; >> >> >> >> It might thappen that we're fail to interpret the 'Device not >> >present' >> >> status correctly (which will happen for non-connected DVDs) >causing >> >the >> >> SCSI stack to make incorrect decisions later on. >> >> >> >> Cheers, >> >> >> >> Hannes >> > >> >There are several oddities about the host SCSI interface that I see: >> > 1. The host bus seems to report up to 6 devices even though only 2 >are >> > present (Disk and CDROM). >> >2. The CDROM emulation doesn't report the same status as a real >device. >> > 3. The host emulation of SCSI doesn't support all the page codes >which >> > is why there is the hack. >> > >> >But as James said, these don't appear to be related to the failure >> >because >> >the code worked before and only in post 4.11 merege is there a >problem. >> >> Your wait for the hang trace is the most suggestive. It says we're >waiting for a partition read to the spurious device. Previously this >would have failed or timed out, so this seems to be the root cause. >> >> James >> >> > >Where is the number of valid LUN's determined during the scan process? Depends. If you can do a report lun scan then that's definitive. You seem to be probing (SCSI_probe_and_add_lun) and you make us think there's something there by responding wrongly to the initial inquiry. James -- Sent from my Android device
Re: [regression] vmw_pvscsi probe fails on 4.11
Le 02/03/2017 à 17:05, Christoph Hellwig a écrit : Please try this patch: diff --git a/drivers/scsi/vmw_pvscsi.c b/drivers/scsi/vmw_pvscsi.c index ef474a748744..c374e3b5c678 100644 --- a/drivers/scsi/vmw_pvscsi.c +++ b/drivers/scsi/vmw_pvscsi.c @@ -1487,7 +1487,7 @@ static int pvscsi_probe(struct pci_dev *pdev, const struct pci_device_id *id) irq_flag &= ~PCI_IRQ_MSI; error = pci_alloc_irq_vectors(adapter->dev, 1, 1, irq_flag); - if (error) + if (error < 0) goto out_reset_adapter; adapter->use_req_threshold = pvscsi_setup_req_threshold(adapter, true); It works, thanks. Btw, vmw_vmci has the same issue. diff --git a/drivers/misc/vmw_vmci/vmci_guest.c b/drivers/misc/vmw_vmci/vmci_guest.c index 9d659542a335..dad5abee656e 100644 --- a/drivers/misc/vmw_vmci/vmci_guest.c +++ b/drivers/misc/vmw_vmci/vmci_guest.c @@ -566,10 +566,10 @@ static int vmci_guest_probe_device(struct pci_dev *pdev, */ error = pci_alloc_irq_vectors(pdev, VMCI_MAX_INTRS, VMCI_MAX_INTRS, PCI_IRQ_MSIX); - if (error) { + if (error < 0) { error = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSIX | PCI_IRQ_MSI | PCI_IRQ_LEGACY); - if (error) + if (error < 0) goto err_remove_bitmap; } else { vmci_dev->exclusive_vectors = true;
Re: SCSI regression in 4.11
On Thu, 02 Mar 2017 10:36:17 -0800 James Bottomleywrote: > On March 2, 2017 10:23:24 AM PST, Stephen Hemminger > wrote: > >On Thu, 2 Mar 2017 14:25:14 +0100 > >Hannes Reinecke wrote: > > > >> On 03/02/2017 02:40 AM, Stephen Hemminger wrote: > >> > On Thu, 2 Mar 2017 01:56:15 +0100 > >> > Christoph Hellwig wrote: > >> > > >> >> On Thu, Mar 02, 2017 at 01:01:35AM +0100, Christoph Hellwig wrote: > > > >> >>> On Wed, Mar 01, 2017 at 07:54:12AM -0800, Stephen Hemminger > >wrote: > >> > > > > > http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4 > > > >> > >> It appears that is already in the code I am testing in > >linux-next... > >> >>> > >> >>> It's in -next now, but it wasn't at the time you reported the > >bug. > >> >>> > >> >>> And it would sortof explain the bug if the INQUIRY data is > >correct > >> >>> in the scatterlist, but we ignore it, given that scsi_probe_lun > >> >>> ignores the result based on sense data. > >> >>> > >> >>> Can you check what happens with the horrible hack below: > >> >> > >> >> Strike that - we're checking result later, so this can't be the > >case. > >> >> > >> >> Now the other interesting thing is the memset in __scsi_exectute, > >> >> which looks very suspicious. Try the following please: > >> >> > >> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > >> >> index 3e32dc954c3c..22f4fb550561 100644 > >> >> --- a/drivers/scsi/scsi_lib.c > >> >> +++ b/drivers/scsi/scsi_lib.c > >> >> @@ -253,7 +253,8 @@ static int __scsi_execute(struct scsi_device > >*sdev, const unsigned char *cmd, > >> >> * and prevent security leaks by zeroing out the excess data. > >> >> */ > >> >> if (unlikely(rq->resid_len > 0 && rq->resid_len <= bufflen)) > >> >> - memset(buffer + (bufflen - rq->resid_len), 0, > >> >> rq->resid_len); > >> >> +// memset(buffer + (bufflen - rq->resid_len), 0, > >> >> rq->resid_len); > >> >> + printk_ratelimited("%s: got resid %d\n", __func__, > >rq->resid_len); > >> >> > >> >> if (resid) > >> >> *resid = rq->resid_len; > >> > > >> > > >> > Still fails but does print resid on some of the later INQUIRY > >commands (not the initial one). > >> > > >> Can you test what happens if you blank out the storvsc_drv > >workaround: > >> > >> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > >> index 585e54f..c36f42d 100644 > >> --- a/drivers/scsi/storvsc_drv.c > >> +++ b/drivers/scsi/storvsc_drv.c > >> @@ -1060,13 +1060,13 @@ static void storvsc_on_io_completion(struct > >> storvsc_device *stor_device, > >> * We do this so we can distinguish truly fatal failues > >> * (srb status == 0x4) and off-line the device in that case. > >> */ > >> - > >> +#if 0 > >> if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) || > >> (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) { > >> vstor_packet->vm_srb.scsi_status = 0; > >> vstor_packet->vm_srb.srb_status = > >SRB_STATUS_SUCCESS; > >> } > >> - > >> +#endif > >> > >> /* Copy over the status...etc */ > >> stor_pkt->vm_srb.scsi_status = > >vstor_packet->vm_srb.scsi_status; > >> > >> It might thappen that we're fail to interpret the 'Device not > >present' > >> status correctly (which will happen for non-connected DVDs) causing > >the > >> SCSI stack to make incorrect decisions later on. > >> > >> Cheers, > >> > >> Hannes > > > >There are several oddities about the host SCSI interface that I see: > > 1. The host bus seems to report up to 6 devices even though only 2 are > > present (Disk and CDROM). > >2. The CDROM emulation doesn't report the same status as a real device. > > 3. The host emulation of SCSI doesn't support all the page codes which > > is why there is the hack. > > > >But as James said, these don't appear to be related to the failure > >because > >the code worked before and only in post 4.11 merege is there a problem. > > Your wait for the hang trace is the most suggestive. It says we're waiting > for a partition read to the spurious device. Previously this would have > failed or timed out, so this seems to be the root cause. > > James > > Where is the number of valid LUN's determined during the scan process?
Re: SCSI regression in 4.11
On Thu, 2 Mar 2017 14:25:14 +0100 Hannes Reineckewrote: > On 03/02/2017 02:40 AM, Stephen Hemminger wrote: > > On Thu, 2 Mar 2017 01:56:15 +0100 > > Christoph Hellwig wrote: > > > >> On Thu, Mar 02, 2017 at 01:01:35AM +0100, Christoph Hellwig wrote: > >>> On Wed, Mar 01, 2017 at 07:54:12AM -0800, Stephen Hemminger wrote: > > > > http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4 > > > > It appears that is already in the code I am testing in linux-next... > >>> > >>> It's in -next now, but it wasn't at the time you reported the bug. > >>> > >>> And it would sortof explain the bug if the INQUIRY data is correct > >>> in the scatterlist, but we ignore it, given that scsi_probe_lun > >>> ignores the result based on sense data. > >>> > >>> Can you check what happens with the horrible hack below: > >> > >> Strike that - we're checking result later, so this can't be the case. > >> > >> Now the other interesting thing is the memset in __scsi_exectute, > >> which looks very suspicious. Try the following please: > >> > >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > >> index 3e32dc954c3c..22f4fb550561 100644 > >> --- a/drivers/scsi/scsi_lib.c > >> +++ b/drivers/scsi/scsi_lib.c > >> @@ -253,7 +253,8 @@ static int __scsi_execute(struct scsi_device *sdev, > >> const unsigned char *cmd, > >> * and prevent security leaks by zeroing out the excess data. > >> */ > >>if (unlikely(rq->resid_len > 0 && rq->resid_len <= bufflen)) > >> - memset(buffer + (bufflen - rq->resid_len), 0, rq->resid_len); > >> +//memset(buffer + (bufflen - rq->resid_len), 0, > >> rq->resid_len); > >> + printk_ratelimited("%s: got resid %d\n", __func__, > >> rq->resid_len); > >> > >>if (resid) > >>*resid = rq->resid_len; > > > > > > Still fails but does print resid on some of the later INQUIRY commands (not > > the initial one). > > > Can you test what happens if you blank out the storvsc_drv workaround: > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > index 585e54f..c36f42d 100644 > --- a/drivers/scsi/storvsc_drv.c > +++ b/drivers/scsi/storvsc_drv.c > @@ -1060,13 +1060,13 @@ static void storvsc_on_io_completion(struct > storvsc_device *stor_device, > * We do this so we can distinguish truly fatal failues > * (srb status == 0x4) and off-line the device in that case. > */ > - > +#if 0 > if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) || > (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) { > vstor_packet->vm_srb.scsi_status = 0; > vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS; > } > - > +#endif > > /* Copy over the status...etc */ > stor_pkt->vm_srb.scsi_status = vstor_packet->vm_srb.scsi_status; > > It might thappen that we're fail to interpret the 'Device not present' > status correctly (which will happen for non-connected DVDs) causing the > SCSI stack to make incorrect decisions later on. > > Cheers, > > Hannes There are several oddities about the host SCSI interface that I see: 1. The host bus seems to report up to 6 devices even though only 2 are present (Disk and CDROM). 2. The CDROM emulation doesn't report the same status as a real device. 3. The host emulation of SCSI doesn't support all the page codes which is why there is the hack. But as James said, these don't appear to be related to the failure because the code worked before and only in post 4.11 merege is there a problem.
Re: SCSI regression in 4.11
On Thu, 2 Mar 2017 14:25:14 +0100 Hannes Reineckewrote: > On 03/02/2017 02:40 AM, Stephen Hemminger wrote: > > On Thu, 2 Mar 2017 01:56:15 +0100 > > Christoph Hellwig wrote: > > > >> On Thu, Mar 02, 2017 at 01:01:35AM +0100, Christoph Hellwig wrote: > >>> On Wed, Mar 01, 2017 at 07:54:12AM -0800, Stephen Hemminger wrote: > > > > http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4 > > > > It appears that is already in the code I am testing in linux-next... > >>> > >>> It's in -next now, but it wasn't at the time you reported the bug. > >>> > >>> And it would sortof explain the bug if the INQUIRY data is correct > >>> in the scatterlist, but we ignore it, given that scsi_probe_lun > >>> ignores the result based on sense data. > >>> > >>> Can you check what happens with the horrible hack below: > >> > >> Strike that - we're checking result later, so this can't be the case. > >> > >> Now the other interesting thing is the memset in __scsi_exectute, > >> which looks very suspicious. Try the following please: > >> > >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > >> index 3e32dc954c3c..22f4fb550561 100644 > >> --- a/drivers/scsi/scsi_lib.c > >> +++ b/drivers/scsi/scsi_lib.c > >> @@ -253,7 +253,8 @@ static int __scsi_execute(struct scsi_device *sdev, > >> const unsigned char *cmd, > >> * and prevent security leaks by zeroing out the excess data. > >> */ > >>if (unlikely(rq->resid_len > 0 && rq->resid_len <= bufflen)) > >> - memset(buffer + (bufflen - rq->resid_len), 0, rq->resid_len); > >> +//memset(buffer + (bufflen - rq->resid_len), 0, > >> rq->resid_len); > >> + printk_ratelimited("%s: got resid %d\n", __func__, > >> rq->resid_len); > >> > >>if (resid) > >>*resid = rq->resid_len; > > > > > > Still fails but does print resid on some of the later INQUIRY commands (not > > the initial one). > > > Can you test what happens if you blank out the storvsc_drv workaround: > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > index 585e54f..c36f42d 100644 > --- a/drivers/scsi/storvsc_drv.c > +++ b/drivers/scsi/storvsc_drv.c > @@ -1060,13 +1060,13 @@ static void storvsc_on_io_completion(struct > storvsc_device *stor_device, > * We do this so we can distinguish truly fatal failues > * (srb status == 0x4) and off-line the device in that case. > */ > - > +#if 0 > if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) || > (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) { > vstor_packet->vm_srb.scsi_status = 0; > vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS; > } > - > +#endif > > /* Copy over the status...etc */ > stor_pkt->vm_srb.scsi_status = vstor_packet->vm_srb.scsi_status; > > It might thappen that we're fail to interpret the 'Device not present' > status correctly (which will happen for non-connected DVDs) causing the > SCSI stack to make incorrect decisions later on. > > Cheers, > > Hannes Makes no difference, that was one of the first things I tried (and just tried again).
Re: [regression] vmw_pvscsi probe fails on 4.11
Please try this patch: diff --git a/drivers/scsi/vmw_pvscsi.c b/drivers/scsi/vmw_pvscsi.c index ef474a748744..c374e3b5c678 100644 --- a/drivers/scsi/vmw_pvscsi.c +++ b/drivers/scsi/vmw_pvscsi.c @@ -1487,7 +1487,7 @@ static int pvscsi_probe(struct pci_dev *pdev, const struct pci_device_id *id) irq_flag &= ~PCI_IRQ_MSI; error = pci_alloc_irq_vectors(adapter->dev, 1, 1, irq_flag); - if (error) + if (error < 0) goto out_reset_adapter; adapter->use_req_threshold = pvscsi_setup_req_threshold(adapter, true);
Re: [PATCH] scsi_sysfs: fix hang when removing scsi device
On Thu, 2017-03-02 at 16:45 +0200, Israel Rukshin wrote: > The bug reproduce when unloading srp module with one port down. > device_del() hangs when __scsi_remove_device() get scsi_device with > state SDEV_OFFLINE or SDEV_TRANSPORT_OFFLINE. > It hangs because device_del() is trying to send sync cache command > when the device is offline but with SDEV_CANCEL status. > The status was changed to SDEV_CANCEL by __scsi_remove_device() > before it calls to device_del(). It is not device_del() but sd_shutdown() that submits the SYNCHRONIZE CACHE command. It should also be explained in the commit message why the block layer timeout mechanism did not cause the SYNCHRONIZE CACHE commands to fail after the timeout expired. > Fixes: e619e6cbe (Revert "scsi: Fix a bdi reregistration race") This does not make sense. A revert is never the original introduction of a bug but at most a reintroduction. Please refer to the commit that originally introduced this hang. > --- a/drivers/scsi/scsi_sysfs.c > +++ b/drivers/scsi/scsi_sysfs.c > @@ -1282,6 +1282,8 @@ void __scsi_remove_device(struct scsi_device *sdev) > return; > > if (sdev->is_visible) { > + enum scsi_device_state oldstate = sdev->sdev_state; > + > if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0) > return; > > @@ -1289,6 +1291,15 @@ void __scsi_remove_device(struct scsi_device *sdev) > device_unregister(>sdev_dev); > transport_remove_device(dev); > scsi_dh_remove_device(sdev); > + > + /* > + * Fail new requests if the old state was offline. > + * This avoids from device_del() to hang. > + */ > + if (oldstate == SDEV_TRANSPORT_OFFLINE || > + oldstate == SDEV_OFFLINE) > + blk_set_queue_dying(sdev->request_queue); > + > device_del(dev); > } else > put_device(>sdev_dev); Please refer to sd_shutdown() instead of device_del() in the above comment, and also explain why the block layer timeout mechanism did not cause the SYNCHRONIZE CACHE commands to fail. Thanks, Bart.
Re: [PATCH] scsi: qedi: fix build error without DEBUG_FS
On Thu, Mar 2, 2017 at 1:10 PM, Arnd Bergmannwrote: > > - QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "do_not_recover=%d\n", > - do_not_recover); > + QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "qedi_do_not_recover=%d\n", > + qedi_do_not_recover); > - QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "do_not_recover=%d\n", > - do_not_recover); > + QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "qedi_do_not_recover=%d\n", > + qedi_do_not_recover); > return 0; > } > - { "do_not_recover", qedi_dbg_do_not_recover_ops}, > + { "qedi_do_not_recover", qedi_dbg_do_not_recover_ops}, > { "io_trace", NULL }, > { NULL, NULL } > > - cnt = sprintf(buffer, "do_not_recover=%d\n", do_not_recover); > + cnt = sprintf(buffer, "qedi_do_not_recover=%d\n", > qedi_do_not_recover); > cnt = min_t(int, count, cnt - *ppos); I just noticed that my search went a little too far, I should not have touched the instances within strings. Will resend. Arnd
[PATCH] scsi_sysfs: fix hang when removing scsi device
The bug reproduce when unloading srp module with one port down. device_del() hangs when __scsi_remove_device() get scsi_device with state SDEV_OFFLINE or SDEV_TRANSPORT_OFFLINE. It hangs because device_del() is trying to send sync cache command when the device is offline but with SDEV_CANCEL status. The status was changed to SDEV_CANCEL by __scsi_remove_device() before it calls to device_del(). This commit doesn't accept new commands if the original state was offline. sysrq: SysRq : sysrq: Show Blocked State task PC stack pid father kworker/2:0 D 88046fa95c00 0 21178 2 0x Workqueue: srp_remove srp_remove_work [ib_srp] Call Trace: [] schedule+0x35/0x80 [] schedule_timeout+0x237/0x2d0 [] io_schedule_timeout+0xa6/0x110 [] wait_for_completion_io+0xa3/0x110 [] blk_execute_rq+0xdf/0x120 [] scsi_execute+0xce/0x150 [scsi_mod] [] scsi_execute_req_flags+0x8f/0xf0 [scsi_mod] [] sd_sync_cache+0xa9/0x190 [sd_mod] [] sd_shutdown+0x6a/0x100 [sd_mod] [] sd_remove+0x64/0xc0 [sd_mod] [] __device_release_driver+0x8d/0x120 [] device_release_driver+0x1e/0x30 [] bus_remove_device+0xf9/0x170 [] device_del+0x127/0x240 [] __scsi_remove_device+0xc1/0xd0 [scsi_mod] [] scsi_forget_host+0x57/0x60 [scsi_mod] [] scsi_remove_host+0x72/0x110 [scsi_mod] [] srp_remove_work+0x8b/0x200 [ib_srp] ... Fixes: e619e6cbe (Revert "scsi: Fix a bdi reregistration race") Signed-off-by: Israel RukshinReviewed-by: Max Gurtovoy --- drivers/scsi/scsi_sysfs.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 82dfe07..ed55aac 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -1282,6 +1282,8 @@ void __scsi_remove_device(struct scsi_device *sdev) return; if (sdev->is_visible) { + enum scsi_device_state oldstate = sdev->sdev_state; + if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0) return; @@ -1289,6 +1291,15 @@ void __scsi_remove_device(struct scsi_device *sdev) device_unregister(>sdev_dev); transport_remove_device(dev); scsi_dh_remove_device(sdev); + + /* +* Fail new requests if the old state was offline. +* This avoids from device_del() to hang. +*/ + if (oldstate == SDEV_TRANSPORT_OFFLINE || + oldstate == SDEV_OFFLINE) + blk_set_queue_dying(sdev->request_queue); + device_del(dev); } else put_device(>sdev_dev); -- 2.4.3
Re: [PATCH] scsi: qedi: fix build error without DEBUG_FS
On 03/02/2017 01:10 PM, Arnd Bergmann wrote: > Without CONFIG_DEBUG_FS, we run into a link error: > > drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_poll': > qedi_iscsi.c:(.text.qedi_ep_poll+0x134): undefined reference to > `do_not_recover' > drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_disconnect': > qedi_iscsi.c:(.text.qedi_ep_disconnect+0x36c): undefined reference to > `do_not_recover' > drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_connect': > qedi_iscsi.c:(.text.qedi_ep_connect+0x350): undefined reference to > `do_not_recover' > drivers/scsi/qedi/qedi_fw.o: In function `qedi_tmf_work': > qedi_fw.c:(.text.qedi_tmf_work+0x3b4): undefined reference to `do_not_recover' > > This defines the symbol as a constant in this case, as there is no way to > set it to anything other than zero without DEBUG_FS. In addition, I'm renaming > it to qedi_do_not_recover in order to put it into a driver specific namespace, > as "do_not_recover" is a really bad name for a kernel-wide global identifier > when it is used only in one driver. > > Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver > framework.") > Signed-off-by: Arnd Bergmann> --- Looks good, Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
[PATCH] [v2] scsi: qedi: fix build error without DEBUG_FS
Without CONFIG_DEBUG_FS, we run into a link error: drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_poll': qedi_iscsi.c:(.text.qedi_ep_poll+0x134): undefined reference to `do_not_recover' drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_disconnect': qedi_iscsi.c:(.text.qedi_ep_disconnect+0x36c): undefined reference to `do_not_recover' drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_connect': qedi_iscsi.c:(.text.qedi_ep_connect+0x350): undefined reference to `do_not_recover' drivers/scsi/qedi/qedi_fw.o: In function `qedi_tmf_work': qedi_fw.c:(.text.qedi_tmf_work+0x3b4): undefined reference to `do_not_recover' This defines the symbol as a constant in this case, as there is no way to set it to anything other than zero without DEBUG_FS. In addition, I'm renaming it to qedi_do_not_recover in order to put it into a driver specific namespace, as "do_not_recover" is a really bad name for a kernel-wide global identifier when it is used only in one driver. Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.") Reviewed-by: Johannes ThumshirnSigned-off-by: Arnd Bergmann --- v2: don't rename references to do_not_recover in string literals --- drivers/scsi/qedi/qedi_debugfs.c | 16 drivers/scsi/qedi/qedi_fw.c | 4 ++-- drivers/scsi/qedi/qedi_gbl.h | 8 +++- drivers/scsi/qedi/qedi_iscsi.c | 8 4 files changed, 21 insertions(+), 15 deletions(-) diff --git a/drivers/scsi/qedi/qedi_debugfs.c b/drivers/scsi/qedi/qedi_debugfs.c index 955936274241..59417199bf36 100644 --- a/drivers/scsi/qedi/qedi_debugfs.c +++ b/drivers/scsi/qedi/qedi_debugfs.c @@ -14,7 +14,7 @@ #include #include -int do_not_recover; +int qedi_do_not_recover; static struct dentry *qedi_dbg_root; void @@ -74,22 +74,22 @@ qedi_dbg_exit(void) static ssize_t qedi_dbg_do_not_recover_enable(struct qedi_dbg_ctx *qedi_dbg) { - if (!do_not_recover) - do_not_recover = 1; + if (!qedi_do_not_recover) + qedi_do_not_recover = 1; QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "do_not_recover=%d\n", - do_not_recover); + qedi_do_not_recover); return 0; } static ssize_t qedi_dbg_do_not_recover_disable(struct qedi_dbg_ctx *qedi_dbg) { - if (do_not_recover) - do_not_recover = 0; + if (qedi_do_not_recover) + qedi_do_not_recover = 0; QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "do_not_recover=%d\n", - do_not_recover); + qedi_do_not_recover); return 0; } @@ -141,7 +141,7 @@ qedi_dbg_do_not_recover_cmd_read(struct file *filp, char __user *buffer, if (*ppos) return 0; - cnt = sprintf(buffer, "do_not_recover=%d\n", do_not_recover); + cnt = sprintf(buffer, "do_not_recover=%d\n", qedi_do_not_recover); cnt = min_t(int, count, cnt - *ppos); *ppos += cnt; return cnt; diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c index c9f0ef4e11b3..2bce3efc66a4 100644 --- a/drivers/scsi/qedi/qedi_fw.c +++ b/drivers/scsi/qedi/qedi_fw.c @@ -1461,9 +1461,9 @@ static void qedi_tmf_work(struct work_struct *work) get_itt(tmf_hdr->rtt), get_itt(ctask->itt), cmd->task_id, qedi_conn->iscsi_conn_id); - if (do_not_recover) { + if (qedi_do_not_recover) { QEDI_ERR(>dbg_ctx, "DONT SEND CLEANUP/ABORT %d\n", -do_not_recover); +qedi_do_not_recover); goto abort_ret; } diff --git a/drivers/scsi/qedi/qedi_gbl.h b/drivers/scsi/qedi/qedi_gbl.h index 8e488de88ece..63d793f46064 100644 --- a/drivers/scsi/qedi/qedi_gbl.h +++ b/drivers/scsi/qedi/qedi_gbl.h @@ -12,8 +12,14 @@ #include "qedi_iscsi.h" +#ifdef CONFIG_DEBUG_FS +extern int qedi_do_not_recover; +#else +#define qedi_do_not_recover (0) +#endif + extern uint qedi_io_tracing; -extern int do_not_recover; + extern struct scsi_host_template qedi_host_template; extern struct iscsi_transport qedi_iscsi_transport; extern const struct qed_iscsi_ops *qedi_ops; diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c index b9f79d36142d..4cc474364c50 100644 --- a/drivers/scsi/qedi/qedi_iscsi.c +++ b/drivers/scsi/qedi/qedi_iscsi.c @@ -833,7 +833,7 @@ qedi_ep_connect(struct Scsi_Host *shost, struct sockaddr *dst_addr, return ERR_PTR(ret); } - if (do_not_recover) { + if (qedi_do_not_recover) { ret = -ENOMEM; return ERR_PTR(ret); } @@ -957,7 +957,7 @@ static int qedi_ep_poll(struct iscsi_endpoint *ep, int timeout_ms) struct qedi_endpoint *qedi_ep; int ret = 0; - if (do_not_recover) + if (qedi_do_not_recover) return 1; qedi_ep = ep->dd_data; @@ -1025,7 +1025,7 @@ static void
Re: [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run
On 03/02/2017 12:24 AM, Bart Van Assche wrote: On Wed, 2017-03-01 at 10:15 +0100, Hannes Reinecke wrote: The current medium access timeout counter will be increased for each command, so if there are enough failed commands we'll hit the medium access timeout for even a single failure. This sentence describes multiple failed commands as a single failure. That's confusing to me. Did you perhaps intend "for a single device failure"? diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index f2cafae..cec439c 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -58,6 +58,7 @@ static int scsi_eh_try_stu(struct scsi_cmnd *scmd); static int scsi_try_to_abort_cmd(struct scsi_host_template *, struct scsi_cmnd *); +static int scsi_eh_reset(struct scsi_cmnd *scmd); /* called with shost->host_lock held */ void scsi_eh_wakeup(struct Scsi_Host *shost) @@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) eh_flag &= ~SCSI_EH_CANCEL_CMD; scmd->eh_eflags |= eh_flag; + scsi_eh_reset(scmd); list_add_tail(>eh_entry, >eh_cmd_q); shost->host_failed++; scsi_eh_wakeup(shost); @@ -1107,7 +1109,19 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn) if (!blk_rq_is_passthrough(scmd->request)) { struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd); if (sdrv->eh_action) - rtn = sdrv->eh_action(scmd, rtn); + rtn = sdrv->eh_action(scmd, rtn, false); + } + return rtn; +} + +static int scsi_eh_reset(struct scsi_cmnd *scmd) +{ + int rtn = SUCCESS; + + if (!blk_rq_is_passthrough(scmd->request)) { + struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd); + if (sdrv->eh_action) + rtn = sdrv->eh_action(scmd, rtn, true); } return rtn; } Can this function be moved up such that we don't need a new forward declaration? Why, but of course. I just put is here to be next to the original scsi_eh_action() function. @@ -1689,18 +1689,28 @@ static int sd_pr_clear(struct block_device *bdev, u64 key) * sd_eh_action - error handling callback * @scmd: sd-issued command that has failed * @eh_disp: The recovery disposition suggested by the midlayer + * @reset: Reset medium access counter It seems to me that @reset does not trigger a reset of the medium access counter but rather of the variable that prevents the medium access error counter to be incremented? Correct. Will be fixing up the description. + * recovery). + * We have to be careful to count a medium access failure only once + * per SCSI EH run; there might be several timed out commands which Did you perhaps intend "once per device per SCSI EH run"? Yes. --- a/drivers/scsi/sd.h +++ b/drivers/scsi/sd.h @@ -106,6 +106,7 @@ struct scsi_disk { unsignedrc_basis: 2; unsignedzoned: 2; unsignedurswrz : 1; + unsignedmedium_access_reset : 1; The name of this new member is confusing to me. How about using the name "ignore_medium_access_errors" instead? And since this is a boolean, how about using true and false in assignments to this variable? Ok, will be doing so. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
Re: SCSI regression in 4.11
On 03/02/2017 02:40 AM, Stephen Hemminger wrote: On Thu, 2 Mar 2017 01:56:15 +0100 Christoph Hellwigwrote: On Thu, Mar 02, 2017 at 01:01:35AM +0100, Christoph Hellwig wrote: On Wed, Mar 01, 2017 at 07:54:12AM -0800, Stephen Hemminger wrote: http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4 It appears that is already in the code I am testing in linux-next... It's in -next now, but it wasn't at the time you reported the bug. And it would sortof explain the bug if the INQUIRY data is correct in the scatterlist, but we ignore it, given that scsi_probe_lun ignores the result based on sense data. Can you check what happens with the horrible hack below: Strike that - we're checking result later, so this can't be the case. Now the other interesting thing is the memset in __scsi_exectute, which looks very suspicious. Try the following please: diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 3e32dc954c3c..22f4fb550561 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -253,7 +253,8 @@ static int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd, * and prevent security leaks by zeroing out the excess data. */ if (unlikely(rq->resid_len > 0 && rq->resid_len <= bufflen)) - memset(buffer + (bufflen - rq->resid_len), 0, rq->resid_len); +// memset(buffer + (bufflen - rq->resid_len), 0, rq->resid_len); + printk_ratelimited("%s: got resid %d\n", __func__, rq->resid_len); if (resid) *resid = rq->resid_len; Still fails but does print resid on some of the later INQUIRY commands (not the initial one). Can you test what happens if you blank out the storvsc_drv workaround: diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 585e54f..c36f42d 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -1060,13 +1060,13 @@ static void storvsc_on_io_completion(struct storvsc_device *stor_device, * We do this so we can distinguish truly fatal failues * (srb status == 0x4) and off-line the device in that case. */ - +#if 0 if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) || (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) { vstor_packet->vm_srb.scsi_status = 0; vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS; } - +#endif /* Copy over the status...etc */ stor_pkt->vm_srb.scsi_status = vstor_packet->vm_srb.scsi_status; It might thappen that we're fail to interpret the 'Device not present' status correctly (which will happen for non-connected DVDs) causing the SCSI stack to make incorrect decisions later on. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
[PATCH] scsi: qedi: fix build error without DEBUG_FS
Without CONFIG_DEBUG_FS, we run into a link error: drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_poll': qedi_iscsi.c:(.text.qedi_ep_poll+0x134): undefined reference to `do_not_recover' drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_disconnect': qedi_iscsi.c:(.text.qedi_ep_disconnect+0x36c): undefined reference to `do_not_recover' drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_connect': qedi_iscsi.c:(.text.qedi_ep_connect+0x350): undefined reference to `do_not_recover' drivers/scsi/qedi/qedi_fw.o: In function `qedi_tmf_work': qedi_fw.c:(.text.qedi_tmf_work+0x3b4): undefined reference to `do_not_recover' This defines the symbol as a constant in this case, as there is no way to set it to anything other than zero without DEBUG_FS. In addition, I'm renaming it to qedi_do_not_recover in order to put it into a driver specific namespace, as "do_not_recover" is a really bad name for a kernel-wide global identifier when it is used only in one driver. Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.") Signed-off-by: Arnd Bergmann--- drivers/scsi/qedi/qedi_debugfs.c | 22 +++--- drivers/scsi/qedi/qedi_fw.c | 4 ++-- drivers/scsi/qedi/qedi_gbl.h | 8 +++- drivers/scsi/qedi/qedi_iscsi.c | 8 4 files changed, 24 insertions(+), 18 deletions(-) diff --git a/drivers/scsi/qedi/qedi_debugfs.c b/drivers/scsi/qedi/qedi_debugfs.c index 955936274241..6baf35eefc1c 100644 --- a/drivers/scsi/qedi/qedi_debugfs.c +++ b/drivers/scsi/qedi/qedi_debugfs.c @@ -14,7 +14,7 @@ #include #include -int do_not_recover; +int qedi_do_not_recover; static struct dentry *qedi_dbg_root; void @@ -74,22 +74,22 @@ qedi_dbg_exit(void) static ssize_t qedi_dbg_do_not_recover_enable(struct qedi_dbg_ctx *qedi_dbg) { - if (!do_not_recover) - do_not_recover = 1; + if (!qedi_do_not_recover) + qedi_do_not_recover = 1; - QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "do_not_recover=%d\n", - do_not_recover); + QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "qedi_do_not_recover=%d\n", + qedi_do_not_recover); return 0; } static ssize_t qedi_dbg_do_not_recover_disable(struct qedi_dbg_ctx *qedi_dbg) { - if (do_not_recover) - do_not_recover = 0; + if (qedi_do_not_recover) + qedi_do_not_recover = 0; - QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "do_not_recover=%d\n", - do_not_recover); + QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "qedi_do_not_recover=%d\n", + qedi_do_not_recover); return 0; } @@ -101,7 +101,7 @@ static struct qedi_list_of_funcs qedi_dbg_do_not_recover_ops[] = { struct qedi_debugfs_ops qedi_debugfs_ops[] = { { "gbl_ctx", NULL }, - { "do_not_recover", qedi_dbg_do_not_recover_ops}, + { "qedi_do_not_recover", qedi_dbg_do_not_recover_ops}, { "io_trace", NULL }, { NULL, NULL } }; @@ -141,7 +141,7 @@ qedi_dbg_do_not_recover_cmd_read(struct file *filp, char __user *buffer, if (*ppos) return 0; - cnt = sprintf(buffer, "do_not_recover=%d\n", do_not_recover); + cnt = sprintf(buffer, "qedi_do_not_recover=%d\n", qedi_do_not_recover); cnt = min_t(int, count, cnt - *ppos); *ppos += cnt; return cnt; diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c index c9f0ef4e11b3..2bce3efc66a4 100644 --- a/drivers/scsi/qedi/qedi_fw.c +++ b/drivers/scsi/qedi/qedi_fw.c @@ -1461,9 +1461,9 @@ static void qedi_tmf_work(struct work_struct *work) get_itt(tmf_hdr->rtt), get_itt(ctask->itt), cmd->task_id, qedi_conn->iscsi_conn_id); - if (do_not_recover) { + if (qedi_do_not_recover) { QEDI_ERR(>dbg_ctx, "DONT SEND CLEANUP/ABORT %d\n", -do_not_recover); +qedi_do_not_recover); goto abort_ret; } diff --git a/drivers/scsi/qedi/qedi_gbl.h b/drivers/scsi/qedi/qedi_gbl.h index 8e488de88ece..63d793f46064 100644 --- a/drivers/scsi/qedi/qedi_gbl.h +++ b/drivers/scsi/qedi/qedi_gbl.h @@ -12,8 +12,14 @@ #include "qedi_iscsi.h" +#ifdef CONFIG_DEBUG_FS +extern int qedi_do_not_recover; +#else +#define qedi_do_not_recover (0) +#endif + extern uint qedi_io_tracing; -extern int do_not_recover; + extern struct scsi_host_template qedi_host_template; extern struct iscsi_transport qedi_iscsi_transport; extern const struct qed_iscsi_ops *qedi_ops; diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c index b9f79d36142d..4cc474364c50 100644 --- a/drivers/scsi/qedi/qedi_iscsi.c +++ b/drivers/scsi/qedi/qedi_iscsi.c @@ -833,7 +833,7 @@ qedi_ep_connect(struct Scsi_Host *shost, struct sockaddr *dst_addr, return ERR_PTR(ret); } - if (do_not_recover) { + if (qedi_do_not_recover) {
[PATCH] megaraid_sas: enable intx only if msix request fails
Without this fix, driver will enable INTx Interrupt pin even though MSI-x vectors are enabled. See below lspci output. DisINTx is unset for MSIx setup. lspci -s 85:00.0 -vvv |grep INT |grep Control Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- After applying this fix, driver will enable INTx Interrupt pin only if Legacy interrupt method is required. See below lspci output. DisINTx is unset for MSIx setup. lspci -s 85:00.0 -vvv |grep INT |grep Control Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Signed-off-by: Kashyap Desai--- drivers/scsi/megaraid/megaraid_sas_base.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index 7ac9a9e..82a8ec8 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -4990,6 +4990,7 @@ int megasas_set_crash_dump_params(struct megasas_instance *instance, struct pci_dev *pdev; pdev = instance->pdev; + pci_intx(pdev, 1); instance->irq_context[0].instance = instance; instance->irq_context[0].MSIxIndex = 0; if (request_irq(pci_irq_vector(pdev, 0), @@ -5277,10 +5278,6 @@ static int megasas_init_fw(struct megasas_instance *instance) MPI2_REPLY_POST_HOST_INDEX_OFFSET); } - i = pci_alloc_irq_vectors(instance->pdev, 1, 1, PCI_IRQ_LEGACY); - if (i < 0) - goto fail_setup_irqs; - dev_info(>pdev->dev, "firmware supports msix\t: (%d)", fw_msix_count); dev_info(>pdev->dev, @@ -5494,7 +5491,6 @@ static int megasas_init_fw(struct megasas_instance *instance) instance->instancet->disable_intr(instance); fail_init_adapter: megasas_destroy_irqs(instance); -fail_setup_irqs: if (instance->msix_vectors) pci_free_irq_vectors(instance->pdev); instance->msix_vectors = 0; -- 1.8.3.1
[PATCH] enclosure: fix sysfs symlinks creation when using multipath
With multipath, it may happen that the same device is passed to enclosure_add_device() multiple times and that the enclosure_add_links() function fails to create the symlinks because the device's sysfs directory entry is still NULL. In this case, the links will never be created because all the subsequent calls to enclosure_add_device() will immediately fail with EEXIST. This is an example of what happens: [ 19.251902] scsi 0:0:27:0: Direct-Access SEAGATE ST8000NM0075 E002 PQ: 0 ANSI: 6 [ 19.261874] scsi 0:0:27:0: SSP: handle(0x0027), sas_addr(0x5000c50085826dd6), phy(34), device_name(0x5000c50085826dd4) [ 19.274656] scsi 0:0:27:0: SSP: enclosure_logical_id(0x500093d001be4000), slot(57) [ 19.283944] scsi 0:0:27:0: SSP: enclosure level(0x), connector name( ) [ 19.292933] scsi 0:0:27:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_l [...] [ 60.066524] enclosure_add_device(0:0:27:0) called, enclosure_add_links() failed with error -2 [...] [ 75.119146] sd 0:0:27:0: Attached scsi generic sg27 type 0 [ 75.126722] sd 0:0:27:0: [sdaa] 15628053168 512-byte logical blocks: (8.00 TB/7.27 TiB) [ 75.126723] sd 0:0:27:0: [sdaa] 4096-byte physical blocks [ 75.129059] sd 0:0:27:0: [sdaa] Write Protect is off [ 75.129061] sd 0:0:27:0: [sdaa] Mode Sense: db 00 10 08 [ 75.130417] sd 0:0:27:0: [sdaa] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 75.158088] sd 0:0:27:0: [sdaa] Attached SCSI disk [...] [ 75.192722] enclosure_add_device(0:0:27:0) called, device already exists This patch modifies the code so the driver will detect this condition and will retry to create the symlinks when enclosure_add_device() is called. Signed-off-by: Maurizio Lombardi--- drivers/misc/enclosure.c | 16 ++-- include/linux/enclosure.h | 1 + 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c index 65fed71..a856c98 100644 --- a/drivers/misc/enclosure.c +++ b/drivers/misc/enclosure.c @@ -375,21 +375,33 @@ int enclosure_add_device(struct enclosure_device *edev, int component, struct device *dev) { struct enclosure_component *cdev; + int error; if (!edev || component >= edev->components) return -EINVAL; cdev = >component[component]; - if (cdev->dev == dev) + if (cdev->dev == dev) { + if (!cdev->links_created) { + error = enclosure_add_links(cdev); + if (!error) + cdev->links_created = 1; + } return -EEXIST; + } if (cdev->dev) enclosure_remove_links(cdev); put_device(cdev->dev); cdev->dev = get_device(dev); - return enclosure_add_links(cdev); + error = enclosure_add_links(cdev); + if (!error) + cdev->links_created = 1; + else + cdev->links_created = 0; + return error; } EXPORT_SYMBOL_GPL(enclosure_add_device); diff --git a/include/linux/enclosure.h b/include/linux/enclosure.h index a4cf57c..c3bdc4c 100644 --- a/include/linux/enclosure.h +++ b/include/linux/enclosure.h @@ -97,6 +97,7 @@ struct enclosure_component { struct device cdev; struct device *dev; enum enclosure_component_type type; + int links_created; int number; int fault; int active; -- Maurizio Lombardi
Re: sg driver, sg_io and sg tablesize
On Wed, 03/01 17:11, Laurence Oberman wrote: > This was suggested by Ewan and is the best way to know what the actual max I/O > size for sg_io would be for the LPFC driver. This sounds silly, why is there a "lame" max I/O size as in max_sectors_kb, that is unusable because of a more limiting max_segments, and an "actual max I/O size" as derived from the latter? What is the reason of the contradiction here? Fam
[regression] vmw_pvscsi probe fails on 4.11
Hi, On Fedora Rawhide (reproduced on git master), vmw_pvscsi probe fails. It also calls free_irq during cleanup, which produces a warning since it didn't call request_irq in this case. Reverting commit 2e48e34 (scsi: vmw_pvscsi: switch to pci_alloc_irq_vectors) fixes the issue (I get "vmw_pvscsi: using MSI-X", and it uses the IRQ 78). [2.671383] vmw_pvscsi: using 64bit dma [2.671750] vmw_pvscsi: max_id: 16 [2.671752] vmw_pvscsi: setting ring_pages to 8 [2.673209] [ cut here ] [2.673219] WARNING: CPU: 2 PID: 402 at kernel/irq/manage.c:1478 __free_irq+0xa4/0x2e0 [2.673220] Trying to free already-free IRQ 78 [2.673221] Modules linked in: serio_raw vmw_pvscsi(+) vmxnet3 ata_generic pata_acpi fjes vmwgfx drm_kms_helper ttm drm [2.673236] CPU: 2 PID: 402 Comm: systemd-udevd Not tainted 4.11.0r0-test1+ #2 [2.673238] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [2.673239] Call Trace: [2.673244] dump_stack+0x8e/0xd1 [2.673248] __warn+0xcb/0xf0 [2.673252] warn_slowpath_fmt+0x5f/0x80 [2.673257] ? _raw_spin_lock_irqsave+0x82/0x90 [2.673259] ? __free_irq+0x5f/0x2e0 [2.673262] __free_irq+0xa4/0x2e0 [2.673265] ? __pci_enable_msix+0x234/0x520 [2.673269] free_irq+0x39/0x90 [2.673274] pvscsi_shutdown_intr+0x25/0x40 [vmw_pvscsi] [2.673277] pvscsi_release_resources+0x1e/0x370 [vmw_pvscsi] [2.673281] pvscsi_probe+0xd2e/0x10a4 [vmw_pvscsi] [2.673306] local_pci_probe+0x42/0xa0 [2.673308] ? pci_match_device+0xe0/0x110 [2.673311] pci_device_probe+0xbe/0x150 [2.673313] ? _raw_spin_unlock+0x27/0x40 [2.673318] driver_probe_device+0x106/0x450 [2.673322] __driver_attach+0xa8/0xf0 [2.673325] ? driver_probe_device+0x450/0x450 [2.673327] bus_for_each_dev+0x75/0xc0 [2.673331] driver_attach+0x1e/0x20 [2.67] bus_add_driver+0x1d3/0x270 [2.673335] ? 0xc032d000 [2.673338] driver_register+0x60/0xe0 [2.673340] ? 0xc032d000 [2.673342] __pci_register_driver+0x60/0x70 [2.673345] pvscsi_init+0x38/0x1000 [vmw_pvscsi] [2.673348] do_one_initcall+0x50/0x1a0 [2.673351] ? rcu_read_lock_sched_held+0x79/0x80 [2.673354] ? kmem_cache_alloc_trace+0x273/0x2e0 [2.673356] ? do_init_module+0x27/0x1e8 [2.673361] do_init_module+0x5f/0x1e8 [2.673365] load_module+0x2386/0x29d0 [2.673367] ? __symbol_put+0x70/0x70 [2.673371] ? show_coresize+0x30/0x30 [2.673375] ? vfs_read+0x131/0x180 [2.673386] SYSC_finit_module+0xf6/0x110 [2.673395] SyS_finit_module+0xe/0x10 [2.673397] do_syscall_64+0x6c/0x1f0 [2.673400] entry_SYSCALL64_slow_path+0x25/0x25 [2.673402] RIP: 0033:0x7f770375e5d9 [2.673403] RSP: 002b:7fff59d224b8 EFLAGS: 0246 ORIG_RAX: 0139 [2.673406] RAX: ffda RBX: 56462cb91f20 RCX: 7f770375e5d9 [2.673407] RDX: RSI: 7f77042999c5 RDI: 0006 [2.673408] RBP: 7f77042999c5 R08: R09: 7fff59d225d0 [2.673409] R10: 0006 R11: 0246 R12: [2.673410] R13: 56462cb8fcd0 R14: 0002 R15: 56462ae4cd81 [2.673419] ---[ end trace 19fc54ed551e946e ]--- [2.677816] vmw_pvscsi :13:00.0: Driver probe function unexpectedly returned 1
Re: [PATCH] mpt3sas: Avoid sleeping in interrupt context
> "Bart" == Bart Van Asschewrites: Bart> Commit 669f044170d8 ("scsi: srp_transport: Move queuecommand() Bart> wait code to SCSI core") can make scsi_internal_device_block() Bart> sleep. However, the mpt3sas driver can call this function from an Bart> interrupt handler. Hence add a second argument to Bart> scsi_internal_device_block() that restores the old behavior of Bart> this function for the mpt3sas handler. Applied to 4.11/scsi-fixes. Thanks, Bart! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v3 00/14] qla2xxx: Bug Fixes and updates for target.
Hi Himanshu & Co, On Fri, 2017-02-24 at 13:37 -0800, Himanshu Madhani wrote: > Hi Nic, > > Please consider this series for inclusion in target-pending. > This is a little late to include for v4.11-rc1, but considering the number of bug-fixes I'm OK to merge it post -rc1. > This series contains following changes. > > o Fix for the deadlock because of inconsistent lock usage reported by you. > o Added patch to submit non-critical MBX command via IOCB path. > o Improved T10-DIF/PI handling with target stack. > o Changed scsi host lookup method. > o Some minor bug fixes. > > Changes from v2 --> v3 > > o Updated patch to use waitq instead of while loop for waiting. > o Removed duplication of Target stack definitations for T10-DIF/PI. > o In addition, addressed review comments from Bart & Christoph. > > Changes from v1 -> v2 > > o Rebased series based on scsi-target-for-v4.11 branch. > > Please apply to target-pending. > > Thanks, > Himanshu > > Anil Gurumurthy (1): > qla2xxx: Export DIF stats via debugfs > > Himanshu Madhani (2): > qla2xxx: Add DebugFS node to display Port Database > qla2xxx: Update driver version to 9.00.00.00-k > > Joe Carnuccio (1): > qla2xxx: Allow vref count to timeout on vport delete. > > Quinn Tran (10): > qla2xxx: Fix delayed response to command for loop mode/direct connect. > qla2xxx: Allow relogin to proceed if remote login did not finish > qla2xxx: Use IOCB interface to submit non-critical MBX. > qla2xxx: Improve T10-DIF/PI handling in driver. > qla2xxx: Change scsi host lookup method. > qla2xxx: Fix memory leak for abts processing > qla2xxx: Fix request queue corruption. > qla2xxx: Fix inadequate lock protection for ABTS. > qla2xxx: Add async new target notification > qla2xxx: Fix sess_lock & hardware_lock lock order problem. > > drivers/scsi/qla2xxx/Kconfig | 1 + > drivers/scsi/qla2xxx/qla_attr.c| 4 +- > drivers/scsi/qla2xxx/qla_dbg.h | 1 + > drivers/scsi/qla2xxx/qla_def.h | 46 ++- > drivers/scsi/qla2xxx/qla_dfs.c | 115 +- > drivers/scsi/qla2xxx/qla_gbl.h | 12 +- > drivers/scsi/qla2xxx/qla_init.c| 85 ++--- > drivers/scsi/qla2xxx/qla_isr.c | 41 ++- > drivers/scsi/qla2xxx/qla_mbx.c | 304 ++-- > drivers/scsi/qla2xxx/qla_mid.c | 14 +- > drivers/scsi/qla2xxx/qla_os.c | 24 +- > drivers/scsi/qla2xxx/qla_target.c | 729 > +++-- > drivers/scsi/qla2xxx/qla_target.h | 36 +- > drivers/scsi/qla2xxx/qla_version.h | 6 +- > drivers/scsi/qla2xxx/tcm_qla2xxx.c | 49 ++- > 15 files changed, 1049 insertions(+), 418 deletions(-) >