Re: [PATCHv2 0/7] Limit overall SCSI EH runtime
On Mon, 2013-07-01 at 16:55 -0400, Jörn Engel wrote: On Mon, 1 July 2013 19:23:25 +, James Bottomley wrote: On Mon, 2013-07-01 at 13:44 -0400, Jörn Engel wrote: If a single device is bad, don't ever do a host reset. This isn't a tenable position. Sometimes a device looks bad because the host state for it has gone insane. At that point, the only safe action is a reset of the host to sane state. I could be persuaded that you should never do the transport equivalent of a bus reset (on non-SPI transports, at least), which is actually hard to do on some of the modern transports, but I don't think you can get away without having a host reset in the eh arsenal. Fair enough. Hardware being hardware and hardware bugs being hard to fix, I see your point. However, we shouldn't screw the poor user who has paid a premium for a second HBA to get some redundancy and reset both of them at the same time. That would, you know, defeat the redundancy. ;) I don't understand what you're getting at. In a dual HBA situation, whether the second HBA is implicated or not depends on configuration and what the first HBA is doing. If it's just passively lost device state, then the second HBA should continue just fine. If the insane HBA is injecting rogue data on the bus then, in a properly isolated configuration, it shouldn't be able to affect the second HBA, but if there's some leak and it does, chances are error handling will occur on both simultaneously. I don't see any way to avoid this other than having the user buy better hardware and properly configure it. James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Linux boot Support for 4KB sector drives ?
Adding Asha also. -Original Message- From: Mahesh Rajashekhara Sent: Monday, July 01, 2013 11:41 AM To: jbottom...@parallels.com; linux-scsi@vger.kernel.org Cc: Tony Ruiz; Achim Leubner; Mahesh Rajashekhara; Kishore Babu Lukka Subject: Linux boot Support for 4KB sector drives ? Hello, Does any of the Linux OS flavors support booting from the 4K sector (advanced format) drive in legacy BIOS mode (MBR partitioning scheme) ? Thanks Regards, Mahesh -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux boot Support for 4KB sector drives ?
On 07/02/2013 03:11 PM, Kishore Babu Lukka wrote: Adding Asha also. -Original Message- From: Mahesh Rajashekhara Sent: Monday, July 01, 2013 11:41 AM To: jbottom...@parallels.com; linux-scsi@vger.kernel.org Cc: Tony Ruiz; Achim Leubner; Mahesh Rajashekhara; Kishore Babu Lukka Subject: Linux boot Support for 4KB sector drives ? Hello, Does any of the Linux OS flavors support booting from the 4K sector (advanced format) drive in legacy BIOS mode (MBR partitioning scheme) ? That depends on boot loader, not Linux I think. Linux has support for 4k sector drive, but if the boot loader doesn't, it can't fetch the kernel into memory and load Linux. Legacy grub makes use of BIOS interrupt service and thus shouldn't be able to support 4k sector drive, I don't know the status of grub2. Thanks, Aaron Thanks Regards, Mahesh -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v13 0/4] SCSI device removal fixes
Fix a few issues related to SCSI device removal: - Fix a race between starved list processing and device removal that can trigger a kernel oops. - Avoid that __scsi_remove_device() is called twice for the same SCSI device, which also can cause a kernel oops. - Restrict the SCSI device state changes allowed via sysfs. - Avoid re-enabling I/O after the transport layer became offline. Changes compared to v12: - Clarified the description of the patch for handling a transport layer failure during LUN scanning: mentioned that this patch was developed after analyzing the cause of a kernel oops triggered by asynchronous LUN scanning. - Restored the previous version of the patch for restricting sysfs SCSI device state changes, namely the version that only disallows changing the device state into cancel or deleted. - Added a comment in patch 4/4. - Left out a patch that was the result of source reading and also an intermediate patch that is no longer needed in this series. Changes compared to v11: - Left out a patch that was not a device removal bug fix. - Left out the patches about which there is not yet an agreement. Changes compared to v10: - Rebased and retested on top of Linux kernel v3.10-rc5. Changes compared to v9: - Changed one WARN_ON() statement into a WARN() statement. Changes compared to v8: - Addressed the feedback from Joe Lawrence - dropped the patch that makes scsi_remove_host() wait until the last sdev user is gone. - Eliminated Scsi_Host.tmf_in_progress since it duplicates state information available in Scsi_Host.eh_active. - Added a patch to avoid reenabling I/O after the transport layer became offline. Changes compared to v7: - Addressed the review comments posted by Hannes Reinecke and Rolf Eike Beer. - Modified patch Make scsi_remove_host() wait until error handling finished such that it is also safe for SCSI timeout values below the maximum LLD response time by modifying scsi_send_eh_cmnd() such that it does not invoke any LLD code after scsi_remove_host() started. - Added a patch to save and restore the host_scribble field. - Refined / clarified several patch descriptions. - Rebased and retested on top of kernel v3.8-rc6. Changes compared to v6: - Dropped the first six patches since Jens queued these for 3.8. - Added patch to avoid that __scsi_remove_device() is invoked twice. - Restore error recovery in the SHOST_CANCEL state. Changes compared to v5: - Avoid that block layer work can be scheduled on a dead queue. - Do not invoke any SCSI LLD callback after scsi_remove_host() finished. - Stop error handling as soon as scsi_remove_host() started. - Remove the unused function bsg_goose_queue(). - Avoid that scsi_device_set_state() triggers a race condition. Changes compared to v4: - Moved queue_flag_set(QUEUE_FLAG_DEAD, q) from blk_drain_queue() into blk_cleanup_queue(). - Declared the new __blk_run_queue_uncond() function inline. Checked in the generated assembler code that this function is really inlined in __blk_run_queue(). - Elaborated several patch descriptions. - Added sparse annotations to scsi_request_fn(). - Split several patches. Changes compared to v3: - Fixed a race condition by setting QUEUE_FLAG_DEAD earlier. - Added a patch for fixing a race between starved list processing and device removal to this series. Changes compared to v2: - Split second patch into two patches. - Refined patch descriptions. Changes compared to v1: - Included a patch to rename QUEUE_FLAG_DEAD. - Refined the descriptions of the __blk_run_queue_uncond() and blk_cleanup_queue() functions. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] Fix race between starved list and device removal
From: James Bottomley jbottom...@parallels.com scsi_run_queue() examines all SCSI devices that are present on the starved list. Since scsi_run_queue() unlocks the SCSI host lock a SCSI device can get removed after it has been removed from the starved list and before its queue is run. Protect against that race condition by holding a reference on the queue while running it. Signed-off-by: James Bottomley jbottom...@parallels.com Signed-off-by: Bart Van Assche bvanass...@acm.org Reported-by: Chanho Min chanho@lge.com Reference: http://lkml.org/lkml/2012/8/2/96 Cc: Tejun Heo t...@kernel.org Cc: Mike Christie micha...@cs.wisc.edu Cc: Hannes Reinecke h...@suse.de Cc: sta...@vger.kernel.org --- drivers/scsi/scsi_lib.c | 26 +- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 86d5220..df8bd5a 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -434,6 +434,8 @@ static void scsi_run_queue(struct request_queue *q) list_splice_init(shost-starved_list, starved_list); while (!list_empty(starved_list)) { + struct request_queue *slq; + /* * As long as shost is accepting commands and we have * starved queues, call blk_run_queue. scsi_request_fn @@ -456,11 +458,25 @@ static void scsi_run_queue(struct request_queue *q) continue; } - spin_unlock(shost-host_lock); - spin_lock(sdev-request_queue-queue_lock); - __blk_run_queue(sdev-request_queue); - spin_unlock(sdev-request_queue-queue_lock); - spin_lock(shost-host_lock); + /* +* Once we drop the host lock, a racing scsi_remove_device() +* call may remove the sdev from the starved list and destroy +* it and the queue. Mitigate by taking a reference to the +* queue and never touching the sdev again after we drop the +* host lock. Note: if __scsi_remove_device() invokes +* blk_cleanup_queue() before the queue is run from this +* function then blk_run_queue() will return immediately since +* blk_cleanup_queue() marks the queue with QUEUE_FLAG_DYING. +*/ + slq = sdev-request_queue; + if (!blk_get_queue(slq)) + continue; + spin_unlock_irqrestore(shost-host_lock, flags); + + blk_run_queue(slq); + blk_put_queue(slq); + + spin_lock_irqsave(shost-host_lock, flags); } /* put any unprocessed entries back */ list_splice(starved_list, shost-starved_list); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] Avoid calling __scsi_remove_device() twice
If something goes wrong during LUN scanning, e.g. a transport layer failure occurs, then __scsi_remove_device() can get invoked by the LUN scanning code for a SCSI device in state SDEV_CREATED_BLOCK and before the SCSI device has been added to sysfs (is_visible == 0). Make sure that even in this case the transition into state SDEV_DEL occurs. This avoids that __scsi_remove_device() can get invoked a second time by scsi_forget_host() if this last function is invoked from another thread than the thread that performs LUN scanning. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: James Bottomley jbottom...@parallels.com Cc: Mike Christie micha...@cs.wisc.edu Cc: Hannes Reinecke h...@suse.de Cc: Tejun Heo t...@kernel.org --- drivers/scsi/scsi_lib.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index df8bd5a..124392f 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2193,6 +2193,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state) case SDEV_OFFLINE: case SDEV_TRANSPORT_OFFLINE: case SDEV_CANCEL: + case SDEV_CREATED_BLOCK: break; default: goto illegal; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] Avoid re-enabling I/O after the transport became offline
Disallow the SDEV_TRANSPORT_OFFLINE to SDEV_CANCEL transition such that no I/O is sent to devices for which the transport is offline. Notes: - Functions like sd_shutdown() use scsi_execute_req() and hence set the REQ_PREEMPT flag. Such requests are passed to the LLD queuecommand callback in the SDEV_CANCEL state. - This patch does not affect Fibre Channel LLD drivers since these drivers invoke fc_remote_port_chkready() before submitting a SCSI request to the HBA. That prevents a timeout to occur in state SDEV_CANCEL if the transport is offline. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Mike Christie micha...@cs.wisc.edu Cc: James Bottomley jbottom...@parallels.com Cc: Hannes Reinecke h...@suse.de Cc: Tejun Heo t...@kernel.org --- drivers/scsi/scsi_lib.c |1 - drivers/scsi/scsi_sysfs.c |9 - 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 124392f..a0fb56b 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2178,7 +2178,6 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state) case SDEV_RUNNING: case SDEV_QUIESCE: case SDEV_OFFLINE: - case SDEV_TRANSPORT_OFFLINE: case SDEV_BLOCK: break; default: diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 931a7d9..1711617 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -955,7 +955,14 @@ void __scsi_remove_device(struct scsi_device *sdev) struct device *dev = sdev-sdev_gendev; if (sdev-is_visible) { - if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0) + /* +* The transition from SDEV_TRANSPORT_OFFLINE into +* SDEV_CANCEL is not allowed since this transition would +* reenable I/O. However, if the device state was already +* SDEV_TRANSPORT_OFFLINE, proceed with device removal. +*/ + if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0 + sdev-sdev_state != SDEV_TRANSPORT_OFFLINE) return; bsg_unregister_queue(sdev-request_queue); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] Disallow changing the device state via sysfs into deleted
Changing the state of a SCSI device via sysfs into cancel or deleted prevents removal of these devices by scsi_remove_host(). Hence do not allow this. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Tejun Heo t...@kernel.org Cc: James Bottomley jbottom...@parallels.com Cc: Mike Christie micha...@cs.wisc.edu Cc: Hannes Reinecke h...@suse.de Cc: David Milburn dmilb...@redhat.com --- drivers/scsi/scsi_sysfs.c |6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 1711617..292df85 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -605,10 +605,8 @@ store_state_field(struct device *dev, struct device_attribute *attr, break; } } - if (!state) - return -EINVAL; - - if (scsi_device_set_state(sdev, state)) + if (state == 0 || state == SDEV_CANCEL || state == SDEV_DEL || + scsi_device_set_state(sdev, state)) return -EINVAL; return count; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] Avoid re-enabling I/O after the transport became offline
On Tue, 2013-07-02 at 15:07 +0200, Bart Van Assche wrote: Disallow the SDEV_TRANSPORT_OFFLINE to SDEV_CANCEL transition such that no I/O is sent to devices for which the transport is offline. Notes: - Functions like sd_shutdown() use scsi_execute_req() and hence set the REQ_PREEMPT flag. Such requests are passed to the LLD queuecommand callback in the SDEV_CANCEL state. - This patch does not affect Fibre Channel LLD drivers since these drivers invoke fc_remote_port_chkready() before submitting a SCSI request to the HBA. That prevents a timeout to occur in state SDEV_CANCEL if the transport is offline. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Mike Christie micha...@cs.wisc.edu Cc: James Bottomley jbottom...@parallels.com Cc: Hannes Reinecke h...@suse.de Cc: Tejun Heo t...@kernel.org --- drivers/scsi/scsi_lib.c |1 - drivers/scsi/scsi_sysfs.c |9 - 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 124392f..a0fb56b 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2178,7 +2178,6 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state) case SDEV_RUNNING: case SDEV_QUIESCE: case SDEV_OFFLINE: - case SDEV_TRANSPORT_OFFLINE: case SDEV_BLOCK: break; default: diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 931a7d9..1711617 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -955,7 +955,14 @@ void __scsi_remove_device(struct scsi_device *sdev) struct device *dev = sdev-sdev_gendev; if (sdev-is_visible) { - if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0) + /* + * The transition from SDEV_TRANSPORT_OFFLINE into + * SDEV_CANCEL is not allowed since this transition would + * reenable I/O. However, if the device state was already + * SDEV_TRANSPORT_OFFLINE, proceed with device removal. + */ + if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0 + sdev-sdev_state != SDEV_TRANSPORT_OFFLINE) This isn't the right way to do this, because it's adding uncharted state to the state model. What should happen is that this should be reflected in the actual state model. It sounds like we need a CANCEL_OFFLINE state to which TRANSPORT_OFFLINE (and possibly OFFLINE) can transition. The comment on the transition should state that CANCEL_OFFLINE won't allow any I/O. James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] First round of SCSI updates for the 3.10+ merge window
The patch set is mostly driver updates (usf, zfcp, lpfc, mpt2sas, megaraid_sas, bfa, ipr) and a few bug fixes. Also of note is that the Buslogic driver has been rewritten to a better coding style and 64 bit support added. We also removed the libsas limitation on 16 bytes for the command size (currently no drivers make use of this). The patch is available here: git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-misc The short changelog is: Akinobu Mita (3): ufshcd-pltfrm: remove unnecessary dma_set_coherent_mask() call ufs: fix register address in UIC error interrupt handling ufshcd-pltfrm: add missing empty slot in ufs_of_match[] Ben Hutchings (1): sd: Fix parsing of 'temporary ' cache mode prefix Dan Carpenter (2): fnic: potential dead lock in fnic_is_abts_pending() pm80xx: remove unneeded NULL check Daniel Hansel (1): zfcp: fix adapter (re)open recovery while link to SAN is down Eddie Wai (1): libiscsi: Added new boot entries in the session sysfs Geert Uytterhoeven (1): ufs: SCSI_UFSHCD should depend on SCSI_DMA Hannes Reinecke (1): sd: avoid deadlocks when running under multipath Jakob Normark (1): bfa: Fixes for 0-terminated strncpy and possible null pointer dereference James Bottomley (1): libsas: implement 16 byte CDB support James Georgas (1): megaraid: minor cut and paste error fixed. James Smart (17): lpfc 8.3.40: Update lpfc version to driver version 8.3.40 lpfc 8.3.40: Update Copyrights to 2013 for 8.3.38, 8.3.39, and 8.3.40 modifications lpfc 8.3.40: Fixed a race condition between SLI host and port failed FCF rediscovery lpfc 8.3.40: Fixed issue mailbox wait routine failed to issue dump memory mbox command lpfc 8.3.40: Fixed system panic due to unsafe walking and deleting linked list lpfc 8.3.40: Fixed FCoE connection list vlan identifier and add FCF list debug lpfc 8.3.40: Clarified the behavior of the lpfc_max_luns module parameter lpfc 8.3.40: Fix to allow OCM to report FEC status lpfc 8.3.40: Fixed a missing return code in a logging message lpfc 8.3.40: Fixed some logging message fields lpfc 8.3.40: Fixed list corruption when lpfc_drain_tx runs. lpfc 8.3.40: Fix starting reference tag when calculating BG error lpfc 8.3.40: Fix inconsistent list removal causes crash. lpfc 8.3.40: Fixed system panic during handling unsolicited receive buffer error condition lpfc 8.3.40: Fix BlockGuard error checking lpfc 8.3.40: Fixed crash during FCoE failover testing. lpfc 8.3.40: Fix lpfc_used_cpu to be more dynamic K. Y. Srinivasan (2): storvsc: Update the storage protocol to win8 level storvsc: Increase the value of scsi timeout for storvsc devices Karen Xie (1): cxgb4i: add support for T5 adapter Khalid Aziz (3): MAINTAINERS: Add myself as the maintainer for BusLogic SCSI driver BusLogic: Port driver to 64-bit. BusLogic: Fix style issues Mahesh Rajashekhara (1): aacraid: Fix for arrays are going offline in the system. System hangs Martin K. Petersen (4): sd: Update WRITE SAME heuristics 3w-: Create sense buffer for unsupported commands Workaround for disks that report bad optimal transfer length Allow error handling timeout to be specified Martin Peschke (1): zfcp: remove access control tables interface Naresh Kumar Inna (1): csiostor: Retain default adapter configuration in absence of config file. Reddy, Sreekanth (2): mpt2sas: fix for unused variable 'event_data' warning mpt2sas: Fix for issue Missing delay not getting set during system bootup Sachin Kamat (1): ufs: Remove redundant platform_set_drvdata() Sebastian Ott (4): zfcp: remove unused device_unregister wrapper zfcp: cleanup unit sysfs attribute usage zfcp: cleanup port sysfs attribute usage zfcp: cfdc fops add owner Sergei Shtylyov (1): ipr: qc_fill_rtf() method should not store alternate status register Seungwon Jeon (5): ufs: use devres functions for ufshcd ufs: rework link start-up process ufs: remove version check before IS reg clear ufs: amend interrupt configuration ufs: wrap the i/o access operations Sreekanth Reddy (6): mpt2sas: Bump driver vesion to v15.100.00.00 mpt2sas: Calulate the Reply post queue depth calculation as per the MPI spec mpt2sas: fix firmware failure with wrong task attribute mpt2sas: Fix for device scan following host reset could get stuck in a infinite loop mpt2sas: Update the timing requirements for issuing a Hard Reset mpt2sas: MPI2 Rev W (2.00.15) specification Steffen Maier (3): zfcp: status read buffers on first adapter open with link down zfcp: module parameter dbflevel for early debugging zfcp: block queue limits with data router Sujit Reddy Thumma (2): ufs:
Re: [PATCH 1/4] scsi: ufs: Fix broken task management command implementation
On Fri, Jun 28, 2013 at 5:02 PM, Sujit Reddy Thumma sthu...@codeaurora.org wrote: On 6/27/2013 4:49 PM, Santosh Y wrote: + spin_lock_irqsave(host-host_lock, flags); task_req_descp = hba-utmrdl_base_addr; task_req_descp += free_slot; @@ -2353,38 +2387,39 @@ ufshcd_issue_tm_cmd(struct ufs_hba *hba, (struct utp_upiu_task_req *) task_req_descp-task_req_upiu; task_req_upiup-header.dword_0 = UPIU_HEADER_DWORD(UPIU_TRANSACTION_TASK_REQ, 0, - lrbp-lun, lrbp-task_tag); + lun_id, free_slot); Actually it still doesn't fix the problem. The*task tag* used here should be unique across the SCSI/Query and Task Managment UPIUs. I am sorry, I didn't get that. Why should it be unique across the SCSI/Query? For example, if a machine supports 32 request slots and 8 task management slots, then the task management command tag can be anything out of 8 slots. The spec(ufs 1.1) has the requirement under '10.5.2 Basic Header Format'-'Task Tag'. Couple of devices I came across had similar behavior. The tracking of UPIUs --even belonging to a separate group-- seemed to be based on the 'task tag' value rather than 'type of UPIU'-'task tag'. -- ~Santosh -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 0/7] Limit overall SCSI EH runtime
On Tue, 2 July 2013 06:37:05 +, James Bottomley wrote: I don't understand what you're getting at. In a dual HBA situation, whether the second HBA is implicated or not depends on configuration and what the first HBA is doing. If it's just passively lost device state, then the second HBA should continue just fine. If the insane HBA is If the problem is an insane drive instead of an insane HBA, both HBAs will be in roughly the same state at roughly the same time - assuming they both send commands to the insane drive. If they now go into error handling and effectively shut off all the sane drives at roughly the same time, the user is ed. And we shouldn't require the user to buy better hardware. The whole point of a redundant setup is that your plane doesn't crash to the ground when one of your two engines fails. If regulations required perfect engines, you wouldn't be flying to conferences. They require decent engines and enough redundancy that any one can fail at any moment. Computer systems are no different. We can construct a robust system from individually less robust components. Requiring perfect components would be ludicrous. Having a system design where one faulty component will reliably bring the system down is equally ludicrous. Sadly that is also the state of today's scsi stack. This is not a theoretical problem, btw. We currently carry some patches to solve it for us. They are not applicable for mainline in their current state - we support a lot less hardware diversity. But trust me, we didn't create them on a whim. ;) Jörn -- If you're willing to restrict the flexibility of your approach, you can almost always do something better. -- John Carmack -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 0/7] Limit overall SCSI EH runtime
On Tue, 2013-07-02 at 10:58 -0400, Jörn Engel wrote: On Tue, 2 July 2013 06:37:05 +, James Bottomley wrote: I don't understand what you're getting at. In a dual HBA situation, whether the second HBA is implicated or not depends on configuration and what the first HBA is doing. If it's just passively lost device state, then the second HBA should continue just fine. If the insane HBA is If the problem is an insane drive instead of an insane HBA, both HBAs will be in roughly the same state at roughly the same time - assuming they both send commands to the insane drive. If they now go into error handling and effectively shut off all the sane drives at roughly the same time, the user is ed. That's handled in device reset, so I don't understand your point. James And we shouldn't require the user to buy better hardware. The whole point of a redundant setup is that your plane doesn't crash to the ground when one of your two engines fails. If regulations required perfect engines, you wouldn't be flying to conferences. They require decent engines and enough redundancy that any one can fail at any moment. Computer systems are no different. We can construct a robust system from individually less robust components. Requiring perfect components would be ludicrous. Having a system design where one faulty component will reliably bring the system down is equally ludicrous. Sadly that is also the state of today's scsi stack. This is not a theoretical problem, btw. We currently carry some patches to solve it for us. They are not applicable for mainline in their current state - we support a lot less hardware diversity. But trust me, we didn't create them on a whim. ;) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 0/7] Limit overall SCSI EH runtime
On Tue, 2 July 2013 16:33:40 +, James Bottomley wrote: On Tue, 2013-07-02 at 10:58 -0400, Jörn Engel wrote: On Tue, 2 July 2013 06:37:05 +, James Bottomley wrote: I don't understand what you're getting at. In a dual HBA situation, whether the second HBA is implicated or not depends on configuration and what the first HBA is doing. If it's just passively lost device state, then the second HBA should continue just fine. If the insane HBA is If the problem is an insane drive instead of an insane HBA, both HBAs will be in roughly the same state at roughly the same time - assuming they both send commands to the insane drive. If they now go into error handling and effectively shut off all the sane drives at roughly the same time, the user is ed. That's handled in device reset, so I don't understand your point. Doesn't a device reset require all IO to the entire HBA to stop? Hannes patches fixed that for aborts, but not for device reset yet, afaics. Jörn -- In America you can have either a flimsy box banged together out of two by fours and drywall, or a McMansion -- a flimsy box banged together out of two by fours and drywall, but larger, more dramatic-looking, and full of expensive fittings. -- Paul Graham -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 59601] commit 97dec564fd4948e0e560869c80b76e166ca2a83e breaks communication with XYRATEX disk shelves
https://bugzilla.kernel.org/show_bug.cgi?id=59601 --- Comment #7 from Jack Hill jackh...@jackhill.us 2013-07-02 22:13:28 --- Created an attachment (id=106661) -- (https://bugzilla.kernel.org/attachment.cgi?id=106661) dmesg output with packet dumps I have attached the dmesg output after applying the patch you provided. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 59601] commit 97dec564fd4948e0e560869c80b76e166ca2a83e breaks communication with XYRATEX disk shelves
https://bugzilla.kernel.org/show_bug.cgi?id=59601 --- Comment #8 from Jack Hill jackh...@jackhill.us 2013-07-02 22:15:42 --- Also, I think the commit that I claimed introduced the problem after my bisect run was the wrong one, it appears to be the last good commit. I think the one that introduces the bug is ff2fc42e74e43721310bff710416230aae6ce0b9 Sorry about that, Jack -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 59301] mpt2sas0 fails after short time of work
https://bugzilla.kernel.org/show_bug.cgi?id=59301 Marian Marinov m...@yuhu.biz changed: What|Removed |Added CC||m...@yuhu.biz --- Comment #2 from Marian Marinov m...@yuhu.biz 2013-07-03 00:35:41 --- I have similar problem with CentOS 6.4: Jul 3 01:03:14 BlackPearl kernel: mpt2sas0: _base_fault_reset_work : SAS host is non-operational Jul 3 01:03:15 BlackPearl kernel: mpt2sas0: _base_fault_reset_work : SAS host is non-operational Jul 3 01:03:16 BlackPearl kernel: mpt2sas0: _base_fault_reset_work : SAS host is non-operational Jul 3 01:03:17 BlackPearl kernel: mpt2sas0: _base_fault_reset_work : SAS host is non-operational Jul 3 01:03:18 BlackPearl kernel: mpt2sas0: _base_fault_reset_work : SAS host is non-operational Jul 3 01:03:19 BlackPearl kernel: mpt2sas0: _base_fault_reset_work : SAS host is non-operational Jul 3 01:03:19 BlackPearl kernel: mpt2sas0: _base_fault_reset_work: Running mpt2sas_dead_ioc thread success Jul 3 01:03:19 BlackPearl kernel: mpt2sas0: IR shutdown (sending) Jul 3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Unhandled error code Jul 3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jul 3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] CDB: Write(10): 2a 00 07 55 40 28 00 00 08 00 Jul 3 01:03:19 BlackPearl kernel: Buffer I/O error on device dm-2, logical block 9089541 Jul 3 01:03:19 BlackPearl kernel: lost page write due to I/O error on dm-2 Jul 3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Unhandled error code Jul 3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jul 3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] CDB: Write(10): 2a 00 03 01 1f 48 00 00 08 00 Jul 3 01:03:19 BlackPearl kernel: Buffer I/O error on device dm-2, logical block 11753 Jul 3 01:03:19 BlackPearl kernel: lost page write due to I/O error on dm-2 Jul 3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Unhandled error code Jul 3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jul 3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] CDB: Write(10): 2a 00 06 04 06 98 00 00 18 00 Jul 3 01:03:19 BlackPearl kernel: JBD2: Detected IO errors while flushing file data on dm-2-8 Jul 3 01:03:19 BlackPearl kernel: Aborting journal on device dm-2-8. Jul 3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Unhandled error code Jul 3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jul 3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] CDB: Write(10): 2a 00 06 03 b0 00 00 00 08 00 Jul 3 01:03:19 BlackPearl kernel: Buffer I/O error on device dm-2, logical block 6324224 Jul 3 01:03:19 BlackPearl kernel: lost page write due to I/O error on dm-2 Jul 3 01:03:19 BlackPearl kernel: JBD2: I/O error detected when updating journal superblock for dm-2-8. Jul 3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Unhandled error code Jul 3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jul 3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] CDB: Write(10): 2a 00 07 55 40 28 00 00 08 00 Jul 3 01:03:19 BlackPearl kernel: Buffer I/O error on device dm-2, logical block 9089541 Jul 3 01:03:19 BlackPearl kernel: lost page write due to I/O error on dm-2 Jul 3 01:03:19 BlackPearl kernel: JBD2: Detected IO errors while flushing file data on dm-2-8 Jul 3 01:03:19 BlackPearl kernel: EXT4-fs error (device dm-2): ext4_journal_start_sb: Detected aborted journal Jul 3 01:03:19 BlackPearl kernel: EXT4-fs (dm-2): Remounting filesystem read-only Jul 3 01:03:19 BlackPearl kernel: sd 0:0:0:0: [sdb] Unhandled error code Jul 3 01:03:19 BlackPearl kernel: sd 0:0:0:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jul 3 01:03:19 BlackPearl kernel: sd 0:0:0:0: [sdb] CDB: Write(10): 2a 00 02 d2 bc d8 00 00 08 00 Jul 3 01:03:19 BlackPearl kernel: Buffer I/O error on device sdb2, logical block 5792147 Jul 3 01:03:19 BlackPearl kernel: lost page write due to I/O error on sdb2 Jul 3 01:03:19 BlackPearl kernel: sd 0:0:0:0: [sdb] Unhandled error code Jul 3 01:03:19 BlackPearl kernel: sd 0:0:0:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jul 3 01:03:19 BlackPearl kernel: sd 0:0:0:0: [sdb] CDB: Write(10): 2a 00 02 d2 bc b8 00 00 08 00 Jul 3 01:03:19 BlackPearl kernel: Buffer I/O error on device sdb2, logical block 5792143 Jul 3 01:03:19 BlackPearl kernel: lost page write due to I/O error on sdb2 Jul 3 01:03:19 BlackPearl kernel: sd 0:0:0:0: [sdb] Unhandled error code Jul 3 01:03:19 BlackPearl kernel: sd 0:0:0:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jul 3 01:03:19 BlackPearl kernel: sd 0:0:0:0: [sdb] CDB: Write(10): 2a 00 02 d2 bc 90 00 00 08 00 Jul 3
[Bug 59301] mpt2sas0 fails after short time of work
https://bugzilla.kernel.org/show_bug.cgi?id=59301 --- Comment #3 from Marian Marinov m...@yuhu.biz 2013-07-03 00:36:39 --- [root@BlackPearl ~]# uname -a Linux BlackPearl.yuhu.biz 2.6.32-358.11.1.el6.i686 #1 SMP Wed Jun 12 01:01:27 UTC 2013 i686 i686 i386 GNU/Linux [root@BlackPearl ~]# cat /etc/redhat-release CentOS release 6.4 (Final) -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 3/4] scsi: ufs: Fix device and host reset methods
+ +/** + * ufshcd_eh_device_reset_handler - device reset handler registered to + *scsi layer. + * @cmd - SCSI command pointer + * + * Returns SUCCESS/FAILED + */ +static int ufshcd_eh_device_reset_handler(struct scsi_cmnd *cmd) +{ + struct ufs_hba *hba; + int err; + unsigned long flags; + + hba = shost_priv(cmd-device-host); + + spin_lock_irqsave(hba-host-host_lock, flags); + if (hba-ufshcd_state == UFSHCD_STATE_RESET) { + dev_warn(hba-dev, %s: reset in progress\n, __func__); + err = SUCCESS; + spin_unlock_irqrestore(hba-host-host_lock, flags); + goto out; It is better to wait here until the state changes to 'operational' or 'error' before returning success. + } + + hba-ufshcd_state = UFSHCD_STATE_RESET; + ufshcd_set_device_reset_pending(hba); + spin_unlock_irqrestore(hba-host-host_lock, flags); + + err = ufshcd_reset_and_restore(hba); + + spin_lock_irqsave(hba-host-host_lock, flags); + if (!err) { + err = SUCCESS; + hba-ufshcd_state = UFSHCD_STATE_OPERATIONAL; + } else { + err = FAILED; + hba-ufshcd_state = UFSHCD_STATE_ERROR; + } + spin_unlock_irqrestore(hba-host-host_lock, flags); +out: + return err; +} + +/** + * ufshcd_eh_host_reset_handler - host reset handler registered to scsi layer + * @cmd - SCSI command pointer + * + * Returns SUCCESS/FAILED + */ +static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd) +{ + struct ufs_hba *hba; + int err; + unsigned long flags; + + hba = shost_priv(cmd-device-host); + + spin_lock_irqsave(hba-host-host_lock, flags); + if (hba-ufshcd_state == UFSHCD_STATE_RESET) { + dev_warn(hba-dev, %s: reset in progress\n, __func__); + err = SUCCESS; + spin_unlock_irqrestore(hba-host-host_lock, flags); + goto out; same in this case also. + } + + hba-ufshcd_state = UFSHCD_STATE_RESET; + ufshcd_set_host_reset_pending(hba); + spin_unlock_irqrestore(hba-host-host_lock, flags); + + err = ufshcd_reset_and_restore(hba); + + spin_lock_irqsave(hba-host-host_lock, flags); + if (!err) { + err = SUCCESS; + hba-ufshcd_state = UFSHCD_STATE_OPERATIONAL; + } else { + err = FAILED; + hba-ufshcd_state = UFSHCD_STATE_ERROR; + } + spin_unlock_irqrestore(hba-host-host_lock, flags); +out: + return err; +} + +/** -- ~Santosh -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html