On 03/15/13 12:55, Hannes Reinecke wrote:
And the LLDD is forced into error recovery which'll take _ages_ as each
and every command send during error recovery will time out.
Hello Hannes,
I'm analyzing a related but not identical issue with SRP. It would help
if you could tell with which
On 03/15/13 13:37, Bryn M. Reeves wrote:
On 03/15/2013 12:24 PM, Bart Van Assche wrote:
On 03/15/13 12:55, Hannes Reinecke wrote:
And the LLDD is forced into error recovery which'll take _ages_ as each
and every command send during error recovery will time out.
Hello Hannes,
I'm analyzing
On 03/15/13 14:28, Bryn M. Reeves wrote:
On 03/15/2013 12:46 PM, Bart Van Assche wrote:
The SCSI EH keeps trying until all outstanding request have been
finished. Does lpfc_host_reset_handler() invoke scsi_done() for
I don't think so (ends up calling lpfc_sli_cancel_iocbs() via
On 03/15/13 19:51, Mike Christie wrote:
On 03/15/2013 08:41 AM, Bart Van Assche wrote:
How about using the value of scsi_cmnd.jiffies_at_alloc to finish only
those SCSI commands in the host reset handler that exceeded a certain
processing time ?
We basically do this now. When a scsi command
On 03/17/13 18:32, Joe Lawrence wrote:
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c1b05a8..8f009c4 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1635,6 +1635,8 @@ static void scsi_eh_lock_door(struct scsi_device *sdev)
* request
On 03/23/13 12:28, Wanlong Gao wrote:
+static struct virtio_scsi_vq *virtscsi_pick_vq(struct virtio_scsi *vscsi,
+ struct virtio_scsi_target_state
*tgt)
+{
+ struct virtio_scsi_vq *vq;
+ unsigned long flags;
+ u32 queue_num;
+
+
Fix a few issues that can be triggered by removing a device:
- Fix a race between starved list processing and device removal.
- Avoid that a SCSI LLD callback can get invoked after
scsi_remove_host() finished.
- Speed up device removal by stopping error handling as soon as
the SHOST_DEL or
by increasing the
sdev refcount before running the sdev queue. Also, remove a
SCSI device from the starved list before __scsi_remove_device()
decreases the sdev refcount such that the get_device() call
added in scsi_run_queue() is guaranteed to succeed.
Signed-off-by: Bart Van Assche bvanass
Now that all scsi_request_fn() callers hold a reference on the
SCSI device that function is invoked for and since
blk_cleanup_queue() waits until scsi_request_fn() has finished
it is safe to remove the get_device() / put_device() pair from
scsi_request_fn().
Signed-off-by: Bart Van Assche bvanass
. If this happens then
the SCSI device has not yet been added to sysfs (is_visible == 0).
Make sure that if this happens these devices are transitioned
into state SDEV_DEL. This avoids that __scsi_remove_device()
gets invoked a second time by scsi_forget_host().
Signed-off-by: Bart Van Assche bvanass
() such that gcc
does not issue a warning about an enumeration value not being
handled inside a switch statement.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Tejun Heo t...@kernel.org
Cc: James Bottomley jbottom...@parallels.com
Cc: Mike Christie micha...@cs.wisc.edu
Cc: Hannes Reinecke h
Since it is not allowed to invoke scsi_remove_host() with interrupts
disabled, avoid saving and restoring the interrupt state inside
scsi_remove_host(). This patch does not change the functionality of
the function scsi_remove_host().
Signed-off-by: Bart Van Assche bvanass...@acm.org
Acked
has finished. Also, do
not invoke any of the eh_* functions after scsi_remove_host() has
started. Remove Scsi_Host.tmf_in_progress because it is now
superfluous.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Hannes Reinecke h...@suse.de
Cc: Mike Christie micha...@cs.wisc.edu
Cc: Tejun Heo t
Make concurrent invocations of scsi_device_set_state() safe.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Acked-by: Hannes Reinecke h...@suse.de
Cc: James Bottomley jbottom...@parallels.com
Cc: Tejun Heo t...@kernel.org
Cc: Mike Christie micha...@cs.wisc.edu
---
drivers/scsi/scsi_error.c
A SCSI LLD may overwrite host_scribble in its queuecommand()
implementation. Several drivers need that field to process
requests and aborts correctly. Hence this field must be saved
by scsi_eh_prep_cmnd() and must be restored by
scsi_eh_restore_cmnd().
Signed-off-by: Bart Van Assche bvanass
in the SDEV_CANCEL state.
- This patch does not affect Fibre Channel LLD drivers since these
drivers invoke fc_remote_port_chkready() before submitting a SCSI
request to the HBA. That prevents a timeout to occur in state
SDEV_CANCEL if the transport is offline.
Signed-off-by: Bart Van Assche
On 05/03/13 20:23, James Bottomley wrote:
+ const unsigned long stall_for = min(msecs_to_jiffies(10), 1UL);
Hello James,
Can you please clarify what the intention of this statement is ? Is the
purpose of this statement to avoid that stall_for would be zero in case
HZ 100 ? If that is
On 05/10/13 05:11, Martin K. Petersen wrote:
Introduce eh_timeout which can be used for error handling purposes. This
was previously hardcoded to 10 seconds in the SCSI error handling
code. However, for some fast-fail scenarios it is necessary to be able
to tune this as it can take several
Avoid that the fcport structure gets leaked if
bsg_job-request-msgcode == FC_BSG_HST_ELS_NOLOGIN, the fcport
allocation succeeds and the !vha-flags.online branch is taken.
Detected by Coverity.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Chad Dupuis chad.dup...@qlogic.com
Cc: Saurav
On 05/17/13 07:00, Saurav Kashyap wrote:
Instead of doing this I would move this check to the top of the function,
something like this
-8--
diff --git a/drivers/scsi/qla2xxx/qla_bsg.c
b/drivers/scsi/qla2xxx/qla_bsg.c
index 371bb86..b5f84ae 100644
---
On 04/28/13 18:19, ronnie sahlberg wrote:
Hi List,
Interested in SCSI tests?
I have a reasonable SCSI (mainly SBC) testsuite at
https://github.com/sahlberg/libiscsi
while I am mainly interested in testing of iSCSI targets, most of my
tests so far are for the SCSI protocol
so they work quite
On 06/03/13 20:25, gurinder.sherg...@hp.com wrote:
The dev_loss_tmo parameter controls how long the FC transport layer
insulates a loss of a remote port. It is defined to be between 1 and
SCSI_DEVICE_BLOCK_MAX_TIMEOUT (600). Different drivers do slightly different
checks when passed this
Remove dead code and simplify a pointer computation.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Chad Dupuis chad.dup...@qlogic.com
Cc: Saurav Kashyap saurav.kash...@qlogic.com
---
drivers/scsi/qla2xxx/qla_bsg.c |9 +
1 file changed, 1 insertion(+), 8 deletions(-)
diff
At the end of qla2x00_configure_hba() we know that rval == QLA_SUCCESS.
Hence remove the code that depends on rval != QLA_SUCCESS.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Chad Dupuis chad.dup...@qlogic.com
Cc: Saurav Kashyap saurav.kash...@qlogic.com
---
drivers/scsi/qla2xxx
Remove dead code, simplify a pointer computation and move the
ql84_mgmt assignment to just before its first use.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Chad Dupuis chad.dup...@qlogic.com
Cc: Saurav Kashyap saurav.kash...@qlogic.com
---
drivers/scsi/qla2xxx/qla_bsg.c |9
Since ha-model_desc is an array comparing it against NULL is
superfluous. Hence remove these tests.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Chad Dupuis chad.dup...@qlogic.com
Cc: Saurav Kashyap saurav.kash...@qlogic.com
---
drivers/scsi/qla2xxx/qla_gs.c |3 +--
drivers/scsi
Since the value of cur_seg is not used and since scsi_prot_sglist()
has no side effects it is safe to remove the statement
cur_seg = scsi_prot_sglist(cmd). Detected by Coverity.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Chad Dupuis chad.dup...@qlogic.com
Cc: Saurav Kashyap saurav.kash
The value of the pointer called nxt is not used after the
nxt = qla24xx_copy_eft(ha, nxt) statement. Hence keep the function
call but remove the assignment.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Chad Dupuis chad.dup...@qlogic.com
Cc: Saurav Kashyap saurav.kash...@qlogic.com
overflow by making it explicit
that these three functions initializes both the request and reply
structures. This patch does not change any functionality.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Chad Dupuis chad.dup...@qlogic.com
Cc: Saurav Kashyap saurav.kash...@qlogic.com
---
drivers
Change the 'rval' variable from QLA_FUNCTION_TIMEOUT into QLA_SUCCESS
before starting a loop that is only executed if rval is initialized
to QLA_SUCCESS. Coverity reported that loop as dead code.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Chad Dupuis chad.dup...@qlogic.com
Cc: Saurav
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Chad Dupuis chad.dup...@qlogic.com
Cc: Saurav Kashyap saurav.kash...@qlogic.com
---
drivers/scsi/qla2xxx/qla_os.c |4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx
values. Make it easy for Coverity (and for humans) to recognize
that there is no fcport leak in the error path by changing the
bsg_job-request-msgcode == FC_BSG_HST_ELS_NOLOGIN test into
bsg_job-request-msgcode != FC_BSG_RPT_ELS.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Chad Dupuis
On 06/07/13 21:06, Saurav Kashyap wrote:
Thanks for the patches. Please share the warnings reported by Coverity, so
that its easy for us to review the patches.
Hello Saurav,
The Coverity warnings that led me to developing this patch series are as
follows:
- For patches 1, 2, 3, 8: Logically
On 06/11/13 20:57, James Bottomley wrote:
Actually, I think we can dump the workqueue altogether. The only reason
we need it is because the current abort handlers wait for the command
and return the completion state. However, all LLDs are capable of
emitting TMFs at interrupt level, so if we
Fix a few issues that can be triggered by removing a SCSI device:
- Fix a race between starved list processing and device removal that
can trigger a kernel oops.
- Avoid that a SCSI LLD callback can get invoked after
scsi_remove_host() finished.
- Speed up device removal by stopping error
by increasing the
sdev refcount before running the sdev queue. Also, remove a
SCSI device from the starved list before __scsi_remove_device()
decreases the sdev refcount such that the get_device() call
added in scsi_run_queue() is guaranteed to succeed.
Signed-off-by: Bart Van Assche bvanass
Now that all scsi_request_fn() callers hold a reference on the
SCSI device that function is invoked for and since
blk_cleanup_queue() waits until scsi_request_fn() has finished
it is safe to remove the get_device() / put_device() pair from
scsi_request_fn().
Signed-off-by: Bart Van Assche bvanass
. If this happens then
the SCSI device has not yet been added to sysfs (is_visible == 0).
Make sure that if this happens these devices are transitioned
into state SDEV_DEL. This avoids that __scsi_remove_device()
gets invoked a second time by scsi_forget_host().
Signed-off-by: Bart Van Assche bvanass
() such that gcc
does not issue a warning about an enumeration value not being
handled inside a switch statement.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Tejun Heo t...@kernel.org
Cc: James Bottomley jbottom...@parallels.com
Cc: Mike Christie micha...@cs.wisc.edu
Cc: Hannes Reinecke h
Since it is not allowed to invoke scsi_remove_host() with interrupts
disabled, avoid saving and restoring the interrupt state inside
scsi_remove_host(). This patch does not change the functionality of
the function scsi_remove_host().
Signed-off-by: Bart Van Assche bvanass...@acm.org
Acked
has finished. Also, do
not invoke any of the eh_* functions after scsi_remove_host() has
started. Remove Scsi_Host.tmf_in_progress because it is now
superfluous.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Hannes Reinecke h...@suse.de
Cc: Mike Christie micha...@cs.wisc.edu
Cc: Tejun Heo t
Make concurrent invocations of scsi_device_set_state() safe.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Acked-by: Hannes Reinecke h...@suse.de
Cc: James Bottomley jbottom...@parallels.com
Cc: Tejun Heo t...@kernel.org
Cc: Mike Christie micha...@cs.wisc.edu
---
drivers/scsi/scsi_error.c
in the SDEV_CANCEL state.
- This patch does not affect Fibre Channel LLD drivers since these
drivers invoke fc_remote_port_chkready() before submitting a SCSI
request to the HBA. That prevents a timeout to occur in state
SDEV_CANCEL if the transport is offline.
Signed-off-by: Bart Van Assche
The purpose of this InfiniBand SRP initiator patch series is as follows:
- Make the SRP initiator driver better suited for use in a H.A. setup.
Speed up failover by reducing the IB RC retry count and by notifying
multipathd faster about transport layer failures by adding
fast_io_fail_tmo
a queuecommand callback is racy
because srp_remove_host() must be invoked before scsi_remove_host()
and because the queuecommand callback may get invoked after
srp_remove_host() has finished.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol...@purestorage.com
Cc: James Bottomley jbottom
after having detected a transport layer problem and
before failing I/O.
- Support for implementing dev_loss_tmo, the time that should
elapse after having detected a transport layer problem and
before removing a remote port.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol
On 06/13/13 21:43, Vu Pham wrote:
Hello Bart,
+What:/sys/class/srp_remote_ports/port-h:n/dev_loss_tmo
+Date:September 1, 2013
+KernelVersion:3.11
+Contact:linux-scsi@vger.kernel.org, linux-r...@vger.kernel.org
+Description:Number of seconds the SCSI layer will
On 06/14/13 19:59, Vu Pham wrote:
On 06/13/13 21:43, Vu Pham wrote:
+/**
+ * srp_tmo_valid() - check timeout combination validity
+ *
+ * If no fast I/O fail timeout has been configured then the device
loss timeout
+ * must be below SCSI_DEVICE_BLOCK_MAX_TIMEOUT. If a fast I/O fail
timeout has
On 06/17/13 08:18, Hannes Reinecke wrote:
On 06/15/2013 11:52 AM, Bart Van Assche wrote:
On 06/14/13 19:59, Vu Pham wrote:
On 06/13/13 21:43, Vu Pham wrote:
+/**
+ * srp_tmo_valid() - check timeout combination validity
+ *
+ * If no fast I/O fail timeout has been configured then the
device
On 06/17/13 09:14, Hannes Reinecke wrote:
On 06/17/2013 09:04 AM, Bart Van Assche wrote:
I agree that the value of fast_io_fail_tmo should be kept small.
Although as you explained changing the SCSI device state into
SDEV_BLOCK doesn't help for I/O that has already been queued on a
failed path
On 06/19/13 15:44, Jack Wang wrote:
+ /*
+* It can occur that after fast_io_fail_tmo expired and before
+* dev_loss_tmo expired that the SCSI error handler has
+* offlined one or more devices. scsi_target_unblock() doesn't
+
On 06/19/13 19:42, Ewan D. Milne wrote:
+static void starget_evt_emit(struct scsi_target *starget,
+struct starget_event *evt)
+{
+ int idx = 0;
+ char *envp[3];
+
+ switch (evt-evt_type) {
+ case STARGET_EVT_LUN_CHANGE_REPORTED:
+
On 06/23/13 23:35, Mike Christie wrote:
On 06/12/2013 07:52 AM, Bart Van Assche wrote:
SCSI devices are added to the shost-__devices list from inside
scsi_alloc_sdev(). If something goes wrong during LUN scanning,
e.g. a transport layer failure occurs, then __scsi_remove_device()
can get
On 06/24/13 03:05, Mike Christie wrote:
On 6/12/13 7:53 AM, Bart Van Assche wrote:
Changing the state of a SCSI device via sysfs into cancel or
deleted prevents removal of these devices by scsi_remove_host().
Hence do not allow this. Also, introduce the symbolic name
INVALID_SDEV_STATE
On 06/24/13 03:15, Mike Christie wrote:
On 6/12/13 7:55 AM, Bart Van Assche wrote:
A SCSI LLD may start cleaning up host resources as soon as
scsi_remove_host() returns. These host resources may be needed by
the LLD in an implementation of one of the eh_* functions. So if
one of the eh_
On 06/24/13 04:36, James Bottomley wrote:
On Wed, 2013-06-12 at 14:51 +0200, Bart Van Assche wrote:
Now that all scsi_request_fn() callers hold a reference on the
SCSI device that function is invoked for
What makes you think that this is a true statement? The usual caller is
the block layer
On 06/23/13 23:13, Mike Christie wrote:
On 06/12/2013 08:28 AM, Bart Van Assche wrote:
+/*
+ * It can occur that after fast_io_fail_tmo expired and before
+ * dev_loss_tmo expired that the SCSI error handler has
+ * offlined one or more devices
On 06/24/13 12:17, Jack Wang wrote:
@@ -646,14 +703,20 @@ static int scsi_try_target_reset(struct scsi_cmnd *scmd)
static int scsi_try_bus_device_reset(struct scsi_cmnd *scmd)
{
int rtn;
- struct scsi_host_template *hostt = scmd-device-host-hostt;
+ struct Scsi_Host
On 06/24/13 15:34, James Bottomley wrote:
On Mon, 2013-06-24 at 09:13 +0200, Bart Van Assche wrote:
On 06/24/13 04:36, James Bottomley wrote:
On Wed, 2013-06-12 at 14:51 +0200, Bart Van Assche wrote:
Now that all scsi_request_fn() callers hold a reference on the
SCSI device that function
On 06/24/13 15:48, Jack Wang wrote:
I'm not sure it's possible to avoid such a race without introducing
a new mutex. How about something like the (untested) SCSI core patch
below, and invoking scsi_block_eh() and scsi_unblock_eh() around any
reconnect activity not initiated from the SCSI EH
On 06/24/13 17:38, James Bottomley wrote:
I really don't like this because it's shuffling potentially fragile
lifetime rules since you now have to have the sdev deleted from the
starved list before final put. That becomes an unstated assumption
within the code.
The theory is that the starved
On 06/25/13 05:18, Matthew Wilcox wrote:
On Mon, Jun 24, 2013 at 10:07:51AM +0200, Ingo Molnar wrote:
I'm wondering, how will this scheme work if the IO completion latency is a
lot more than the 5 usecs in the testcase? What if it takes 20 usecs or
100 usecs or more?
There's clearly a
On 06/25/13 00:27, James Bottomley wrote:
For a variety of reasons this patch set is incredibly hard to review:
Almost every patch touches pieces in the mid layer where you have to be
sure in minute detail you know what's going on (and what should be going
on), so usually it's a couple of hours
On 06/25/13 15:44, James Bottomley wrote:
On Tue, 2013-06-25 at 10:37 +0200, Bart Van Assche wrote:
On 06/24/13 19:38, James Bottomley wrote:
On Wed, 2013-06-12 at 14:52 +0200, Bart Van Assche wrote:
SCSI devices are added to the shost-__devices list from inside
scsi_alloc_sdev
On 06/25/13 15:45, James Bottomley wrote:
On Tue, 2013-06-25 at 11:01 +0200, Bart Van Assche wrote:
There is a difference though between moving the EH kthread_stop() call
and the patch at the start of this thread: moving the EH kthread_stop()
call does not prevent that an ioctl like
On 06/25/13 19:40, James Bottomley wrote:
If I look at what we actually do: all the HBAs treat scsi_remove_host as
a waited for transition. The reason this works is the loop over
__scsi_remove_device() in scsi_forget_host(). By the time that loop
returns, every scsi_device is gone (and so is
On 06/26/13 11:02, Maxim Uvarov wrote:
This fix:
end_request: I/O error, dev sdc, sector 976576
rport-0:0-3: blocked FC remote port time out: removing target and saving
binding
BUG: unable to handle kernel NULL pointer dereference at 0400
IP: [812f0cc2]
Fix a few issues related to SCSI device removal:
- Fix a race between starved list processing and device removal that
can trigger a kernel oops.
- Avoid that __scsi_remove_device() is called twice for the same SCSI
device, which also can cause a kernel oops.
- Restrict the SCSI device state
against that race condition by holding a reference on the
queue while running it.
Signed-off-by: James Bottomley jbottom...@parallels.com
Signed-off-by: Bart Van Assche bvanass...@acm.org
Reported-by: Chanho Min chanho@lge.com
Reference: http://lkml.org/lkml/2012/8/2/96
Cc: Tejun Heo t
that in that case the transition into
state SDEV_DEL occurs. This avoids that __scsi_remove_device() gets
invoked a second time by scsi_forget_host().
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: James Bottomley jbottom...@parallels.com
Cc: Mike Christie micha...@cs.wisc.edu
Cc: Hannes
-by: Bart Van Assche bvanass...@acm.org
Cc: Tejun Heo t...@kernel.org
Cc: James Bottomley jbottom...@parallels.com
Cc: Mike Christie micha...@cs.wisc.edu
Cc: Hannes Reinecke h...@suse.de
Cc: David Milburn dmilb...@redhat.com
---
drivers/scsi/scsi_sysfs.c |2 +-
1 file changed, 1 insertion(+), 1
Since it is not allowed to invoke scsi_remove_host() with interrupts
disabled, avoid saving and restoring the interrupt state inside
scsi_remove_host(). This patch does not change the functionality of
the function scsi_remove_host().
Signed-off-by: Bart Van Assche bvanass...@acm.org
Acked
Make concurrent invocations of scsi_device_set_state() safe.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Acked-by: Hannes Reinecke h...@suse.de
Cc: James Bottomley jbottom...@parallels.com
Cc: Tejun Heo t...@kernel.org
Cc: Mike Christie micha...@cs.wisc.edu
---
drivers/scsi/scsi_error.c
in the SDEV_CANCEL state.
- This patch does not affect Fibre Channel LLD drivers since these
drivers invoke fc_remote_port_chkready() before submitting a SCSI
request to the HBA. That prevents a timeout to occur in state
SDEV_CANCEL if the transport is offline.
Signed-off-by: Bart Van Assche
The purpose of this InfiniBand SRP initiator patch series is as follows:
- Make the SRP initiator driver better suited for use in a H.A. setup.
Speed up failover by reducing the IB RC retry count and by notifying
multipathd faster about transport layer failures by adding
fast_io_fail_tmo
will cause the SRP initiator to
reconnect, which will cause I/O over the second connection to fail.
Avoid such ping-pong behavior by disabling relogins. Note: if
reconnecting manually is necessary, that is possible by deleting
and recreating an rport via sysfs.
Signed-off-by: Bart Van Assche
a queuecommand callback is racy
because srp_remove_host() must be invoked before scsi_remove_host()
and because the queuecommand callback may get invoked after
srp_remove_host() has finished.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol...@purestorage.com
Cc: James Bottomley jbottom
after having detected a transport layer problem and
before failing I/O.
- Support for implementing dev_loss_tmo, the time that should
elapse after having detected a transport layer problem and
before removing a remote port.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol
.
[bvanassche: Changed patch description]
Signed-off-by: Sebastian Riemer sebastian.rie...@profitbricks.com
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol...@purestorage.com
Cc: David Dillow dillo...@ornl.gov
Cc: Vu Pham v...@mellanox.com
---
drivers/infiniband/ulp/srp/ib_srp.c
On 06/30/13 23:05, David Dillow wrote:
On Fri, 2013-06-28 at 14:53 +0200, Bart Van Assche wrote:
+int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
+{
+ return (fast_io_fail_tmo 0 || dev_loss_tmo 0 ||
+ fast_io_fail_tmo dev_loss_tmo
On 07/01/13 09:05, James Bottomley wrote:
On Thu, 2013-06-27 at 16:53 +0200, Bart Van Assche wrote:
If something goes wrong during LUN scanning, e.g. a transport layer
failure occurs, then __scsi_remove_device() can get invoked by the
LUN scanning code for a SCSI device in state
On 07/01/13 10:27, Hannes Reinecke wrote:
On 06/27/2013 04:57 PM, Bart Van Assche wrote:
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index dfbaa34..666b741 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -959,14 +959,16 @@ void
On 07/01/13 16:49, James Bottomley wrote:
On Thu, 2013-06-27 at 16:56 +0200, Bart Van Assche wrote:
Make concurrent invocations of scsi_device_set_state() safe.
Firstly, I don't understand from this where you think the races are.
Secondly, shouldn't this be the device lock? and thirdly, if we
Fix a few issues related to SCSI device removal:
- Fix a race between starved list processing and device removal that
can trigger a kernel oops.
- Avoid that __scsi_remove_device() is called twice for the same SCSI
device, which also can cause a kernel oops.
- Restrict the SCSI device state
against that race condition by holding a reference on the
queue while running it.
Signed-off-by: James Bottomley jbottom...@parallels.com
Signed-off-by: Bart Van Assche bvanass...@acm.org
Reported-by: Chanho Min chanho@lge.com
Reference: http://lkml.org/lkml/2012/8/2/96
Cc: Tejun Heo t
in this case the transition into state SDEV_DEL
occurs. This avoids that __scsi_remove_device() can get invoked a
second time by scsi_forget_host() if this last function is invoked
from another thread than the thread that performs LUN scanning.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: James
in the SDEV_CANCEL state.
- This patch does not affect Fibre Channel LLD drivers since these
drivers invoke fc_remote_port_chkready() before submitting a SCSI
request to the HBA. That prevents a timeout to occur in state
SDEV_CANCEL if the transport is offline.
Signed-off-by: Bart Van Assche
Changing the state of a SCSI device via sysfs into cancel or
deleted prevents removal of these devices by scsi_remove_host().
Hence do not allow this.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Tejun Heo t...@kernel.org
Cc: James Bottomley jbottom...@parallels.com
Cc: Mike Christie
The purpose of this InfiniBand SRP initiator patch series is as follows:
- Make the SRP initiator driver better suited for use in a H.A. setup.
Add fast_io_fail_tmo and dev_loss_tmo parameters. These can be used
either to speed up failover or to avoid device removal when e.g. using
a queuecommand callback is racy
because srp_remove_host() must be invoked before scsi_remove_host()
and because the queuecommand callback may get invoked after
srp_remove_host() has finished.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol...@purestorage.com
Cc: James Bottomley jbottom
after having detected a transport layer problem and
before failing I/O.
- Support for implementing dev_loss_tmo, the time that should
elapse after having detected a transport layer problem and
before removing a remote port.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol
On 07/03/13 15:38, Or Gerlitz wrote:
Some of these patches were already picked by Roland (SB), I would
suggest that you post V4 and drop the ones which were accepted.
One of the patches that is already in Roland's tree and that was in v1
of this series has been split into two patches in v2
On 07/03/13 19:27, David Dillow wrote:
On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote:
The combination of dev_loss_tmo off and reconnect_delay 0 worked fine
in my tests. An I/O failure was detected shortly after the cable to the
target was pulled. I/O resumed shortly after the cable
On 07/03/13 20:57, David Dillow wrote:
And I'm getting the strong sense that the answer to my question about
fast_io_fail_tmo = 0 when dev_loss_tmo is that we should not allow that
combination, even if it doesn't break the kernel. If it doesn't make
sense, there is no reason to create an
On 07/04/13 10:01, Bart Van Assche wrote:
On 07/03/13 20:57, David Dillow wrote:
And I'm getting the strong sense that the answer to my question about
fast_io_fail_tmo = 0 when dev_loss_tmo is that we should not allow that
combination, even if it doesn't break the kernel. If it doesn't make
Fix a few issues related to SCSI device removal:
- Fix a race between starved list processing and device removal that
can trigger a kernel oops.
- Avoid that __scsi_remove_device() is called twice for the same SCSI
device, which also can cause a kernel oops.
- Restrict the SCSI device state
in this case the transition into state SDEV_DEL
occurs. This avoids that __scsi_remove_device() can get invoked a
second time by scsi_forget_host() if this last function is invoked
from another thread than the thread that performs LUN scanning.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: James
on the
queue while running it.
Signed-off-by: James Bottomley jbottom...@parallels.com
Signed-off-by: Bart Van Assche bvanass...@acm.org
Reported-by: Chanho Min chanho@lge.com
Reference: http://lkml.org/lkml/2012/8/2/96
Cc: Mike Christie micha...@cs.wisc.edu
Cc: Hannes Reinecke h...@suse.de
Cc
Introduce the helper function scsi_device_being_removed(). This patch
does not change any functionality.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Acked-by: Hannes Reinecke h...@suse.de
Cc: James Bottomley jbottom...@parallels.com
Cc: Mike Christie micha...@cs.wisc.edu
---
drivers/scsi
instead of being rejected.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: James Bottomley jbottom...@parallels.com
Cc: Mike Christie micha...@cs.wisc.edu
Cc: Hannes Reinecke h...@suse.de
---
drivers/scsi/scsi_lib.c | 29 +++--
1 file changed, 11 insertions(+), 18
1 - 100 of 2692 matches
Mail list logo