Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bart Van Assche
On 03/15/13 12:55, Hannes Reinecke wrote: And the LLDD is forced into error recovery which'll take _ages_ as each and every command send during error recovery will time out. Hello Hannes, I'm analyzing a related but not identical issue with SRP. It would help if you could tell with which

Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bart Van Assche
On 03/15/13 13:37, Bryn M. Reeves wrote: On 03/15/2013 12:24 PM, Bart Van Assche wrote: On 03/15/13 12:55, Hannes Reinecke wrote: And the LLDD is forced into error recovery which'll take _ages_ as each and every command send during error recovery will time out. Hello Hannes, I'm analyzing

Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bart Van Assche
On 03/15/13 14:28, Bryn M. Reeves wrote: On 03/15/2013 12:46 PM, Bart Van Assche wrote: The SCSI EH keeps trying until all outstanding request have been finished. Does lpfc_host_reset_handler() invoke scsi_done() for I don't think so (ends up calling lpfc_sli_cancel_iocbs() via

Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bart Van Assche
On 03/15/13 19:51, Mike Christie wrote: On 03/15/2013 08:41 AM, Bart Van Assche wrote: How about using the value of scsi_cmnd.jiffies_at_alloc to finish only those SCSI commands in the host reset handler that exceeded a certain processing time ? We basically do this now. When a scsi command

Re: [PATCH] block,scsi: verify return pointer from blk_get_request

2013-03-19 Thread Bart Van Assche
On 03/17/13 18:32, Joe Lawrence wrote: diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index c1b05a8..8f009c4 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1635,6 +1635,8 @@ static void scsi_eh_lock_door(struct scsi_device *sdev) * request

Re: [PATCH V7 4/5] virtio-scsi: introduce multiqueue support

2013-03-25 Thread Bart Van Assche
On 03/23/13 12:28, Wanlong Gao wrote: +static struct virtio_scsi_vq *virtscsi_pick_vq(struct virtio_scsi *vscsi, + struct virtio_scsi_target_state *tgt) +{ + struct virtio_scsi_vq *vq; + unsigned long flags; + u32 queue_num; + +

[PATCH v10 0/9] More device removal fixes

2013-04-07 Thread Bart Van Assche
Fix a few issues that can be triggered by removing a device: - Fix a race between starved list processing and device removal. - Avoid that a SCSI LLD callback can get invoked after scsi_remove_host() finished. - Speed up device removal by stopping error handling as soon as the SHOST_DEL or

[PATCH v10 1/9] Fix race between starved list processing and device removal

2013-04-07 Thread Bart Van Assche
by increasing the sdev refcount before running the sdev queue. Also, remove a SCSI device from the starved list before __scsi_remove_device() decreases the sdev refcount such that the get_device() call added in scsi_run_queue() is guaranteed to succeed. Signed-off-by: Bart Van Assche bvanass

[PATCH v10 2/9] Remove get_device() / put_device() pair from scsi_request_fn()

2013-04-07 Thread Bart Van Assche
Now that all scsi_request_fn() callers hold a reference on the SCSI device that function is invoked for and since blk_cleanup_queue() waits until scsi_request_fn() has finished it is safe to remove the get_device() / put_device() pair from scsi_request_fn(). Signed-off-by: Bart Van Assche bvanass

[PATCH v10 3/9] Avoid calling __scsi_remove_device() twice

2013-04-07 Thread Bart Van Assche
. If this happens then the SCSI device has not yet been added to sysfs (is_visible == 0). Make sure that if this happens these devices are transitioned into state SDEV_DEL. This avoids that __scsi_remove_device() gets invoked a second time by scsi_forget_host(). Signed-off-by: Bart Van Assche bvanass

[PATCH v10 4/9] Disallow changing the device state via sysfs into deleted

2013-04-07 Thread Bart Van Assche
() such that gcc does not issue a warning about an enumeration value not being handled inside a switch statement. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Tejun Heo t...@kernel.org Cc: James Bottomley jbottom...@parallels.com Cc: Mike Christie micha...@cs.wisc.edu Cc: Hannes Reinecke h

[PATCH v10 5/9] Avoid saving/restoring interrupt state inside scsi_remove_host()

2013-04-07 Thread Bart Van Assche
Since it is not allowed to invoke scsi_remove_host() with interrupts disabled, avoid saving and restoring the interrupt state inside scsi_remove_host(). This patch does not change the functionality of the function scsi_remove_host(). Signed-off-by: Bart Van Assche bvanass...@acm.org Acked

[PATCH v10 6/9] Make scsi_remove_host() wait until error handling finished

2013-04-07 Thread Bart Van Assche
has finished. Also, do not invoke any of the eh_* functions after scsi_remove_host() has started. Remove Scsi_Host.tmf_in_progress because it is now superfluous. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Hannes Reinecke h...@suse.de Cc: Mike Christie micha...@cs.wisc.edu Cc: Tejun Heo t

[PATCH v10 7/9] Avoid that scsi_device_set_state() triggers a race

2013-04-07 Thread Bart Van Assche
Make concurrent invocations of scsi_device_set_state() safe. Signed-off-by: Bart Van Assche bvanass...@acm.org Acked-by: Hannes Reinecke h...@suse.de Cc: James Bottomley jbottom...@parallels.com Cc: Tejun Heo t...@kernel.org Cc: Mike Christie micha...@cs.wisc.edu --- drivers/scsi/scsi_error.c

[PATCH v10 8/9] Save and restore host_scribble during error handling

2013-04-07 Thread Bart Van Assche
A SCSI LLD may overwrite host_scribble in its queuecommand() implementation. Several drivers need that field to process requests and aborts correctly. Hence this field must be saved by scsi_eh_prep_cmnd() and must be restored by scsi_eh_restore_cmnd(). Signed-off-by: Bart Van Assche bvanass

[PATCH v10 9/9] Avoid reenabling I/O after the transport became offline

2013-04-07 Thread Bart Van Assche
in the SDEV_CANCEL state. - This patch does not affect Fibre Channel LLD drivers since these drivers invoke fc_remote_port_chkready() before submitting a SCSI request to the HBA. That prevents a timeout to occur in state SDEV_CANCEL if the transport is offline. Signed-off-by: Bart Van Assche

Re: [PATCH] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd

2013-05-04 Thread Bart Van Assche
On 05/03/13 20:23, James Bottomley wrote: + const unsigned long stall_for = min(msecs_to_jiffies(10), 1UL); Hello James, Can you please clarify what the intention of this statement is ? Is the purpose of this statement to avoid that stall_for would be zero in case HZ 100 ? If that is

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Bart Van Assche
On 05/10/13 05:11, Martin K. Petersen wrote: Introduce eh_timeout which can be used for error handling purposes. This was previously hardcoded to 10 seconds in the SCSI error handling code. However, for some fast-fail scenarios it is necessary to be able to tune this as it can take several

[PATCH] qla2x00t: Fix a memory leak in an error path

2013-05-11 Thread Bart Van Assche
Avoid that the fcport structure gets leaked if bsg_job-request-msgcode == FC_BSG_HST_ELS_NOLOGIN, the fcport allocation succeeds and the !vha-flags.online branch is taken. Detected by Coverity. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Chad Dupuis chad.dup...@qlogic.com Cc: Saurav

Re: [PATCH] qla2x00t: Fix a memory leak in an error path

2013-05-17 Thread Bart Van Assche
On 05/17/13 07:00, Saurav Kashyap wrote: Instead of doing this I would move this check to the top of the function, something like this -8-- diff --git a/drivers/scsi/qla2xxx/qla_bsg.c b/drivers/scsi/qla2xxx/qla_bsg.c index 371bb86..b5f84ae 100644 ---

Re: SCSI testing/USB devices are amazing

2013-05-22 Thread Bart Van Assche
On 04/28/13 18:19, ronnie sahlberg wrote: Hi List, Interested in SCSI tests? I have a reasonable SCSI (mainly SBC) testsuite at https://github.com/sahlberg/libiscsi while I am mainly interested in testing of iSCSI targets, most of my tests so far are for the SCSI protocol so they work quite

Re: [PATCH] make dev_loss_tmo checks uniform across drivers

2013-06-04 Thread Bart Van Assche
On 06/03/13 20:25, gurinder.sherg...@hp.com wrote: The dev_loss_tmo parameter controls how long the FC transport layer insulates a loss of a remote port. It is defined to be between 1 and SCSI_DEVICE_BLOCK_MAX_TIMEOUT (600). Different drivers do slightly different checks when passed this

[PATCH 01/10] qla2xxx: Clean up qla24xx_iidma()

2013-06-05 Thread Bart Van Assche
Remove dead code and simplify a pointer computation. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Chad Dupuis chad.dup...@qlogic.com Cc: Saurav Kashyap saurav.kash...@qlogic.com --- drivers/scsi/qla2xxx/qla_bsg.c |9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff

[PATCH 03/10] qla2xxx: Remove dead code in qla2x00_configure_hba()

2013-06-05 Thread Bart Van Assche
At the end of qla2x00_configure_hba() we know that rval == QLA_SUCCESS. Hence remove the code that depends on rval != QLA_SUCCESS. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Chad Dupuis chad.dup...@qlogic.com Cc: Saurav Kashyap saurav.kash...@qlogic.com --- drivers/scsi/qla2xxx

[PATCH 02/10] qla2xxx: Clean up qla84xx_mgmt_cmd()

2013-06-05 Thread Bart Van Assche
Remove dead code, simplify a pointer computation and move the ql84_mgmt assignment to just before its first use. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Chad Dupuis chad.dup...@qlogic.com Cc: Saurav Kashyap saurav.kash...@qlogic.com --- drivers/scsi/qla2xxx/qla_bsg.c |9

[PATCH 04/10] qla2xxx: Remove two superfluous tests

2013-06-05 Thread Bart Van Assche
Since ha-model_desc is an array comparing it against NULL is superfluous. Hence remove these tests. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Chad Dupuis chad.dup...@qlogic.com Cc: Saurav Kashyap saurav.kash...@qlogic.com --- drivers/scsi/qla2xxx/qla_gs.c |3 +-- drivers/scsi

[PATCH 05/10] qla2xxx: Remove a dead assignment in qla24xx_build_scsi_crc_2_iocbs()

2013-06-05 Thread Bart Van Assche
Since the value of cur_seg is not used and since scsi_prot_sglist() has no side effects it is safe to remove the statement cur_seg = scsi_prot_sglist(cmd). Detected by Coverity. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Chad Dupuis chad.dup...@qlogic.com Cc: Saurav Kashyap saurav.kash

[PATCH 06/10] qla2xxx: Remove redundant assignments

2013-06-05 Thread Bart Van Assche
The value of the pointer called nxt is not used after the nxt = qla24xx_copy_eft(ha, nxt) statement. Hence keep the function call but remove the assignment. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Chad Dupuis chad.dup...@qlogic.com Cc: Saurav Kashyap saurav.kash...@qlogic.com

[PATCH 07/10] qla2xxx: Help Coverity with analyzing ct_sns_pkt initialization

2013-06-05 Thread Bart Van Assche
overflow by making it explicit that these three functions initializes both the request and reply structures. This patch does not change any functionality. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Chad Dupuis chad.dup...@qlogic.com Cc: Saurav Kashyap saurav.kash...@qlogic.com --- drivers

[PATCH 08/10] qla2xxx: Fix qla2xxx_check_risc_status()

2013-06-05 Thread Bart Van Assche
Change the 'rval' variable from QLA_FUNCTION_TIMEOUT into QLA_SUCCESS before starting a loop that is only executed if rval is initialized to QLA_SUCCESS. Coverity reported that loop as dead code. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Chad Dupuis chad.dup...@qlogic.com Cc: Saurav

[PATCH 09/10] qla2xxx: Remove an unused variable from qla2x00_remove_one()

2013-06-05 Thread Bart Van Assche
Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Chad Dupuis chad.dup...@qlogic.com Cc: Saurav Kashyap saurav.kash...@qlogic.com --- drivers/scsi/qla2xxx/qla_os.c |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx

[PATCH 10/10] qla2xxx: Fix a memory leak in an error path of qla2x00_process_els()

2013-06-05 Thread Bart Van Assche
values. Make it easy for Coverity (and for humans) to recognize that there is no fcport leak in the error path by changing the bsg_job-request-msgcode == FC_BSG_HST_ELS_NOLOGIN test into bsg_job-request-msgcode != FC_BSG_RPT_ELS. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Chad Dupuis

Re: [PATCH 0/10] qla2xxx: Reduce the number of Coverity warnings

2013-06-08 Thread Bart Van Assche
On 06/07/13 21:06, Saurav Kashyap wrote: Thanks for the patches. Please share the warnings reported by Coverity, so that its easy for us to review the patches. Hello Saurav, The Coverity warnings that led me to developing this patch series are as follows: - For patches 1, 2, 3, 8: Logically

Re: [PATCH 3/9] scsi: improved eh timeout handler

2013-06-12 Thread Bart Van Assche
On 06/11/13 20:57, James Bottomley wrote: Actually, I think we can dump the workqueue altogether. The only reason we need it is because the current abort handlers wait for the command and return the completion state. However, all LLDs are capable of emitting TMFs at interrupt level, so if we

[PATCH v11 0/9] More device removal fixes

2013-06-12 Thread Bart Van Assche
Fix a few issues that can be triggered by removing a SCSI device: - Fix a race between starved list processing and device removal that can trigger a kernel oops. - Avoid that a SCSI LLD callback can get invoked after scsi_remove_host() finished. - Speed up device removal by stopping error

[PATCH v11 1/9] Fix race between starved list and device removal

2013-06-12 Thread Bart Van Assche
by increasing the sdev refcount before running the sdev queue. Also, remove a SCSI device from the starved list before __scsi_remove_device() decreases the sdev refcount such that the get_device() call added in scsi_run_queue() is guaranteed to succeed. Signed-off-by: Bart Van Assche bvanass

[PATCH v11 2/9] Remove get_device() / put_device() pair from scsi_request_fn()

2013-06-12 Thread Bart Van Assche
Now that all scsi_request_fn() callers hold a reference on the SCSI device that function is invoked for and since blk_cleanup_queue() waits until scsi_request_fn() has finished it is safe to remove the get_device() / put_device() pair from scsi_request_fn(). Signed-off-by: Bart Van Assche bvanass

[PATCH v11 3/9] Avoid calling __scsi_remove_device() twice

2013-06-12 Thread Bart Van Assche
. If this happens then the SCSI device has not yet been added to sysfs (is_visible == 0). Make sure that if this happens these devices are transitioned into state SDEV_DEL. This avoids that __scsi_remove_device() gets invoked a second time by scsi_forget_host(). Signed-off-by: Bart Van Assche bvanass

[PATCH v11 4/9] Disallow changing the device state via sysfs into deleted

2013-06-12 Thread Bart Van Assche
() such that gcc does not issue a warning about an enumeration value not being handled inside a switch statement. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Tejun Heo t...@kernel.org Cc: James Bottomley jbottom...@parallels.com Cc: Mike Christie micha...@cs.wisc.edu Cc: Hannes Reinecke h

[PATCH v11 5/9] Avoid saving/restoring interrupt state inside scsi_remove_host()

2013-06-12 Thread Bart Van Assche
Since it is not allowed to invoke scsi_remove_host() with interrupts disabled, avoid saving and restoring the interrupt state inside scsi_remove_host(). This patch does not change the functionality of the function scsi_remove_host(). Signed-off-by: Bart Van Assche bvanass...@acm.org Acked

[PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished

2013-06-12 Thread Bart Van Assche
has finished. Also, do not invoke any of the eh_* functions after scsi_remove_host() has started. Remove Scsi_Host.tmf_in_progress because it is now superfluous. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Hannes Reinecke h...@suse.de Cc: Mike Christie micha...@cs.wisc.edu Cc: Tejun Heo t

PATCH v11 7/9] Avoid that scsi_device_set_state() triggers a race

2013-06-12 Thread Bart Van Assche
Make concurrent invocations of scsi_device_set_state() safe. Signed-off-by: Bart Van Assche bvanass...@acm.org Acked-by: Hannes Reinecke h...@suse.de Cc: James Bottomley jbottom...@parallels.com Cc: Tejun Heo t...@kernel.org Cc: Mike Christie micha...@cs.wisc.edu --- drivers/scsi/scsi_error.c

[PATCH v11 9/9] Avoid reenabling I/O after the transport became offline

2013-06-12 Thread Bart Van Assche
in the SDEV_CANCEL state. - This patch does not affect Fibre Channel LLD drivers since these drivers invoke fc_remote_port_chkready() before submitting a SCSI request to the HBA. That prevents a timeout to occur in state SDEV_CANCEL if the transport is offline. Signed-off-by: Bart Van Assche

[PATCH 0/14] IB SRP initiator patches for kernel 3.11

2013-06-12 Thread Bart Van Assche
The purpose of this InfiniBand SRP initiator patch series is as follows: - Make the SRP initiator driver better suited for use in a H.A. setup. Speed up failover by reducing the IB RC retry count and by notifying multipathd faster about transport layer failures by adding fast_io_fail_tmo

[PATCH 06/14] IB/srp: Keep rport as long as the IB transport layer

2013-06-12 Thread Bart Van Assche
a queuecommand callback is racy because srp_remove_host() must be invoked before scsi_remove_host() and because the queuecommand callback may get invoked after srp_remove_host() has finished. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Roland Dreier rol...@purestorage.com Cc: James Bottomley jbottom

[PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-12 Thread Bart Van Assche
after having detected a transport layer problem and before failing I/O. - Support for implementing dev_loss_tmo, the time that should elapse after having detected a transport layer problem and before removing a remote port. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Roland Dreier rol

Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-14 Thread Bart Van Assche
On 06/13/13 21:43, Vu Pham wrote: Hello Bart, +What:/sys/class/srp_remote_ports/port-h:n/dev_loss_tmo +Date:September 1, 2013 +KernelVersion:3.11 +Contact:linux-scsi@vger.kernel.org, linux-r...@vger.kernel.org +Description:Number of seconds the SCSI layer will

Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-15 Thread Bart Van Assche
On 06/14/13 19:59, Vu Pham wrote: On 06/13/13 21:43, Vu Pham wrote: +/** + * srp_tmo_valid() - check timeout combination validity + * + * If no fast I/O fail timeout has been configured then the device loss timeout + * must be below SCSI_DEVICE_BLOCK_MAX_TIMEOUT. If a fast I/O fail timeout has

Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-17 Thread Bart Van Assche
On 06/17/13 08:18, Hannes Reinecke wrote: On 06/15/2013 11:52 AM, Bart Van Assche wrote: On 06/14/13 19:59, Vu Pham wrote: On 06/13/13 21:43, Vu Pham wrote: +/** + * srp_tmo_valid() - check timeout combination validity + * + * If no fast I/O fail timeout has been configured then the device

Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-17 Thread Bart Van Assche
On 06/17/13 09:14, Hannes Reinecke wrote: On 06/17/2013 09:04 AM, Bart Van Assche wrote: I agree that the value of fast_io_fail_tmo should be kept small. Although as you explained changing the SCSI device state into SDEV_BLOCK doesn't help for I/O that has already been queued on a failed path

Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-19 Thread Bart Van Assche
On 06/19/13 15:44, Jack Wang wrote: + /* +* It can occur that after fast_io_fail_tmo expired and before +* dev_loss_tmo expired that the SCSI error handler has +* offlined one or more devices. scsi_target_unblock() doesn't +

Re: [PATCH v3 3/6] [SCSI] Add support for scsi_target events

2013-06-19 Thread Bart Van Assche
On 06/19/13 19:42, Ewan D. Milne wrote: +static void starget_evt_emit(struct scsi_target *starget, +struct starget_event *evt) +{ + int idx = 0; + char *envp[3]; + + switch (evt-evt_type) { + case STARGET_EVT_LUN_CHANGE_REPORTED: +

Re: [PATCH v11 3/9] Avoid calling __scsi_remove_device() twice

2013-06-24 Thread Bart Van Assche
On 06/23/13 23:35, Mike Christie wrote: On 06/12/2013 07:52 AM, Bart Van Assche wrote: SCSI devices are added to the shost-__devices list from inside scsi_alloc_sdev(). If something goes wrong during LUN scanning, e.g. a transport layer failure occurs, then __scsi_remove_device() can get

Re: [PATCH v11 4/9] Disallow changing the device state via sysfs into deleted

2013-06-24 Thread Bart Van Assche
On 06/24/13 03:05, Mike Christie wrote: On 6/12/13 7:53 AM, Bart Van Assche wrote: Changing the state of a SCSI device via sysfs into cancel or deleted prevents removal of these devices by scsi_remove_host(). Hence do not allow this. Also, introduce the symbolic name INVALID_SDEV_STATE

Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished

2013-06-24 Thread Bart Van Assche
On 06/24/13 03:15, Mike Christie wrote: On 6/12/13 7:55 AM, Bart Van Assche wrote: A SCSI LLD may start cleaning up host resources as soon as scsi_remove_host() returns. These host resources may be needed by the LLD in an implementation of one of the eh_* functions. So if one of the eh_

Re: [PATCH v11 2/9] Remove get_device() / put_device() pair from scsi_request_fn()

2013-06-24 Thread Bart Van Assche
On 06/24/13 04:36, James Bottomley wrote: On Wed, 2013-06-12 at 14:51 +0200, Bart Van Assche wrote: Now that all scsi_request_fn() callers hold a reference on the SCSI device that function is invoked for What makes you think that this is a true statement? The usual caller is the block layer

Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-24 Thread Bart Van Assche
On 06/23/13 23:13, Mike Christie wrote: On 06/12/2013 08:28 AM, Bart Van Assche wrote: +/* + * It can occur that after fast_io_fail_tmo expired and before + * dev_loss_tmo expired that the SCSI error handler has + * offlined one or more devices

Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished

2013-06-24 Thread Bart Van Assche
On 06/24/13 12:17, Jack Wang wrote: @@ -646,14 +703,20 @@ static int scsi_try_target_reset(struct scsi_cmnd *scmd) static int scsi_try_bus_device_reset(struct scsi_cmnd *scmd) { int rtn; - struct scsi_host_template *hostt = scmd-device-host-hostt; + struct Scsi_Host

Re: [PATCH v11 2/9] Remove get_device() / put_device() pair from scsi_request_fn()

2013-06-24 Thread Bart Van Assche
On 06/24/13 15:34, James Bottomley wrote: On Mon, 2013-06-24 at 09:13 +0200, Bart Van Assche wrote: On 06/24/13 04:36, James Bottomley wrote: On Wed, 2013-06-12 at 14:51 +0200, Bart Van Assche wrote: Now that all scsi_request_fn() callers hold a reference on the SCSI device that function

Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-24 Thread Bart Van Assche
On 06/24/13 15:48, Jack Wang wrote: I'm not sure it's possible to avoid such a race without introducing a new mutex. How about something like the (untested) SCSI core patch below, and invoking scsi_block_eh() and scsi_unblock_eh() around any reconnect activity not initiated from the SCSI EH

Re: [PATCH v11 1/9] Fix race between starved list and device removal

2013-06-24 Thread Bart Van Assche
On 06/24/13 17:38, James Bottomley wrote: I really don't like this because it's shuffling potentially fragile lifetime rules since you now have to have the sdev deleted from the starved list before final put. That becomes an unstated assumption within the code. The theory is that the starved

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-25 Thread Bart Van Assche
On 06/25/13 05:18, Matthew Wilcox wrote: On Mon, Jun 24, 2013 at 10:07:51AM +0200, Ingo Molnar wrote: I'm wondering, how will this scheme work if the IO completion latency is a lot more than the 5 usecs in the testcase? What if it takes 20 usecs or 100 usecs or more? There's clearly a

Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished

2013-06-25 Thread Bart Van Assche
On 06/25/13 00:27, James Bottomley wrote: For a variety of reasons this patch set is incredibly hard to review: Almost every patch touches pieces in the mid layer where you have to be sure in minute detail you know what's going on (and what should be going on), so usually it's a couple of hours

Re: [PATCH v11 3/9] Avoid calling __scsi_remove_device() twice

2013-06-25 Thread Bart Van Assche
On 06/25/13 15:44, James Bottomley wrote: On Tue, 2013-06-25 at 10:37 +0200, Bart Van Assche wrote: On 06/24/13 19:38, James Bottomley wrote: On Wed, 2013-06-12 at 14:52 +0200, Bart Van Assche wrote: SCSI devices are added to the shost-__devices list from inside scsi_alloc_sdev

Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished

2013-06-25 Thread Bart Van Assche
On 06/25/13 15:45, James Bottomley wrote: On Tue, 2013-06-25 at 11:01 +0200, Bart Van Assche wrote: There is a difference though between moving the EH kthread_stop() call and the patch at the start of this thread: moving the EH kthread_stop() call does not prevent that an ioctl like

Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished

2013-06-25 Thread Bart Van Assche
On 06/25/13 19:40, James Bottomley wrote: If I look at what we actually do: all the HBAs treat scsi_remove_host as a waited for transition. The reason this works is the loop over __scsi_remove_device() in scsi_forget_host(). By the time that loop returns, every scsi_device is gone (and so is

Re: [PATCH] scsi_prep_fn() check for empty queue

2013-06-26 Thread Bart Van Assche
On 06/26/13 11:02, Maxim Uvarov wrote: This fix: end_request: I/O error, dev sdc, sector 976576 rport-0:0-3: blocked FC remote port time out: removing target and saving binding BUG: unable to handle kernel NULL pointer dereference at 0400 IP: [812f0cc2]

[PATCH v12 0/6] SCSI device removal fixes

2013-06-27 Thread Bart Van Assche
Fix a few issues related to SCSI device removal: - Fix a race between starved list processing and device removal that can trigger a kernel oops. - Avoid that __scsi_remove_device() is called twice for the same SCSI device, which also can cause a kernel oops. - Restrict the SCSI device state

[PATCH v12 1/6] Fix race between starved list and device removal

2013-06-27 Thread Bart Van Assche
against that race condition by holding a reference on the queue while running it. Signed-off-by: James Bottomley jbottom...@parallels.com Signed-off-by: Bart Van Assche bvanass...@acm.org Reported-by: Chanho Min chanho@lge.com Reference: http://lkml.org/lkml/2012/8/2/96 Cc: Tejun Heo t

[PATCH v12 2/6] Avoid calling __scsi_remove_device() twice

2013-06-27 Thread Bart Van Assche
that in that case the transition into state SDEV_DEL occurs. This avoids that __scsi_remove_device() gets invoked a second time by scsi_forget_host(). Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: James Bottomley jbottom...@parallels.com Cc: Mike Christie micha...@cs.wisc.edu Cc: Hannes

[PATCH v12 3/6] Restrict device state changes allowed via sysfs

2013-06-27 Thread Bart Van Assche
-by: Bart Van Assche bvanass...@acm.org Cc: Tejun Heo t...@kernel.org Cc: James Bottomley jbottom...@parallels.com Cc: Mike Christie micha...@cs.wisc.edu Cc: Hannes Reinecke h...@suse.de Cc: David Milburn dmilb...@redhat.com --- drivers/scsi/scsi_sysfs.c |2 +- 1 file changed, 1 insertion(+), 1

[PATCH v12 4/6] Avoid saving/restoring interrupt state inside scsi_remove_host()

2013-06-27 Thread Bart Van Assche
Since it is not allowed to invoke scsi_remove_host() with interrupts disabled, avoid saving and restoring the interrupt state inside scsi_remove_host(). This patch does not change the functionality of the function scsi_remove_host(). Signed-off-by: Bart Van Assche bvanass...@acm.org Acked

[PATCH v12 5/6] Avoid that scsi_device_set_state() triggers a race

2013-06-27 Thread Bart Van Assche
Make concurrent invocations of scsi_device_set_state() safe. Signed-off-by: Bart Van Assche bvanass...@acm.org Acked-by: Hannes Reinecke h...@suse.de Cc: James Bottomley jbottom...@parallels.com Cc: Tejun Heo t...@kernel.org Cc: Mike Christie micha...@cs.wisc.edu --- drivers/scsi/scsi_error.c

[PATCH v12 6/6] Avoid re-enabling I/O after the transport became offline

2013-06-27 Thread Bart Van Assche
in the SDEV_CANCEL state. - This patch does not affect Fibre Channel LLD drivers since these drivers invoke fc_remote_port_chkready() before submitting a SCSI request to the HBA. That prevents a timeout to occur in state SDEV_CANCEL if the transport is offline. Signed-off-by: Bart Van Assche

[PATCH v2 0/15] IB SRP initiator patches for kernel 3.11

2013-06-28 Thread Bart Van Assche
The purpose of this InfiniBand SRP initiator patch series is as follows: - Make the SRP initiator driver better suited for use in a H.A. setup. Speed up failover by reducing the IB RC retry count and by notifying multipathd faster about transport layer failures by adding fast_io_fail_tmo

[PATCH v2 06/15] IB/srp: Maintain a single connection per I_T nexus

2013-06-28 Thread Bart Van Assche
will cause the SRP initiator to reconnect, which will cause I/O over the second connection to fail. Avoid such ping-pong behavior by disabling relogins. Note: if reconnecting manually is necessary, that is possible by deleting and recreating an rport via sysfs. Signed-off-by: Bart Van Assche

[PATCH v2 07/15] IB/srp: Keep rport as long as the IB transport layer

2013-06-28 Thread Bart Van Assche
a queuecommand callback is racy because srp_remove_host() must be invoked before scsi_remove_host() and because the queuecommand callback may get invoked after srp_remove_host() has finished. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Roland Dreier rol...@purestorage.com Cc: James Bottomley jbottom

[PATCH v2 08/15] scsi_transport_srp: Add transport layer error handling

2013-06-28 Thread Bart Van Assche
after having detected a transport layer problem and before failing I/O. - Support for implementing dev_loss_tmo, the time that should elapse after having detected a transport layer problem and before removing a remote port. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Roland Dreier rol

[PATCH v2 12/15] IB/srp: Fail SCSI commands silently

2013-06-28 Thread Bart Van Assche
. [bvanassche: Changed patch description] Signed-off-by: Sebastian Riemer sebastian.rie...@profitbricks.com Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Roland Dreier rol...@purestorage.com Cc: David Dillow dillo...@ornl.gov Cc: Vu Pham v...@mellanox.com --- drivers/infiniband/ulp/srp/ib_srp.c

Re: [PATCH v2 08/15] scsi_transport_srp: Add transport layer error handling

2013-07-01 Thread Bart Van Assche
On 06/30/13 23:05, David Dillow wrote: On Fri, 2013-06-28 at 14:53 +0200, Bart Van Assche wrote: +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo) +{ + return (fast_io_fail_tmo 0 || dev_loss_tmo 0 || + fast_io_fail_tmo dev_loss_tmo

Re: [PATCH v12 2/6] Avoid calling __scsi_remove_device() twice

2013-07-01 Thread Bart Van Assche
On 07/01/13 09:05, James Bottomley wrote: On Thu, 2013-06-27 at 16:53 +0200, Bart Van Assche wrote: If something goes wrong during LUN scanning, e.g. a transport layer failure occurs, then __scsi_remove_device() can get invoked by the LUN scanning code for a SCSI device in state

Re: [PATCH v12 6/6] Avoid re-enabling I/O after the transport became offline

2013-07-01 Thread Bart Van Assche
On 07/01/13 10:27, Hannes Reinecke wrote: On 06/27/2013 04:57 PM, Bart Van Assche wrote: diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index dfbaa34..666b741 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -959,14 +959,16 @@ void

Re: [PATCH v12 5/6] Avoid that scsi_device_set_state() triggers a race

2013-07-01 Thread Bart Van Assche
On 07/01/13 16:49, James Bottomley wrote: On Thu, 2013-06-27 at 16:56 +0200, Bart Van Assche wrote: Make concurrent invocations of scsi_device_set_state() safe. Firstly, I don't understand from this where you think the races are. Secondly, shouldn't this be the device lock? and thirdly, if we

[PATCH v13 0/4] SCSI device removal fixes

2013-07-02 Thread Bart Van Assche
Fix a few issues related to SCSI device removal: - Fix a race between starved list processing and device removal that can trigger a kernel oops. - Avoid that __scsi_remove_device() is called twice for the same SCSI device, which also can cause a kernel oops. - Restrict the SCSI device state

[PATCH 1/4] Fix race between starved list and device removal

2013-07-02 Thread Bart Van Assche
against that race condition by holding a reference on the queue while running it. Signed-off-by: James Bottomley jbottom...@parallels.com Signed-off-by: Bart Van Assche bvanass...@acm.org Reported-by: Chanho Min chanho@lge.com Reference: http://lkml.org/lkml/2012/8/2/96 Cc: Tejun Heo t

[PATCH 2/4] Avoid calling __scsi_remove_device() twice

2013-07-02 Thread Bart Van Assche
in this case the transition into state SDEV_DEL occurs. This avoids that __scsi_remove_device() can get invoked a second time by scsi_forget_host() if this last function is invoked from another thread than the thread that performs LUN scanning. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: James

[PATCH 3/4] Avoid re-enabling I/O after the transport became offline

2013-07-02 Thread Bart Van Assche
in the SDEV_CANCEL state. - This patch does not affect Fibre Channel LLD drivers since these drivers invoke fc_remote_port_chkready() before submitting a SCSI request to the HBA. That prevents a timeout to occur in state SDEV_CANCEL if the transport is offline. Signed-off-by: Bart Van Assche

[PATCH 4/4] Disallow changing the device state via sysfs into deleted

2013-07-02 Thread Bart Van Assche
Changing the state of a SCSI device via sysfs into cancel or deleted prevents removal of these devices by scsi_remove_host(). Hence do not allow this. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Tejun Heo t...@kernel.org Cc: James Bottomley jbottom...@parallels.com Cc: Mike Christie

[PATCH v3 0/13] IB SRP initiator patches for kernel 3.11

2013-07-03 Thread Bart Van Assche
The purpose of this InfiniBand SRP initiator patch series is as follows: - Make the SRP initiator driver better suited for use in a H.A. setup. Add fast_io_fail_tmo and dev_loss_tmo parameters. These can be used either to speed up failover or to avoid device removal when e.g. using

[PATCH v3 06/13] IB/srp: Keep rport as long as the IB transport layer

2013-07-03 Thread Bart Van Assche
a queuecommand callback is racy because srp_remove_host() must be invoked before scsi_remove_host() and because the queuecommand callback may get invoked after srp_remove_host() has finished. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Roland Dreier rol...@purestorage.com Cc: James Bottomley jbottom

[PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-03 Thread Bart Van Assche
after having detected a transport layer problem and before failing I/O. - Support for implementing dev_loss_tmo, the time that should elapse after having detected a transport layer problem and before removing a remote port. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Roland Dreier rol

Re: [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11

2013-07-03 Thread Bart Van Assche
On 07/03/13 15:38, Or Gerlitz wrote: Some of these patches were already picked by Roland (SB), I would suggest that you post V4 and drop the ones which were accepted. One of the patches that is already in Roland's tree and that was in v1 of this series has been split into two patches in v2

Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-03 Thread Bart Van Assche
On 07/03/13 19:27, David Dillow wrote: On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote: The combination of dev_loss_tmo off and reconnect_delay 0 worked fine in my tests. An I/O failure was detected shortly after the cable to the target was pulled. I/O resumed shortly after the cable

Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-04 Thread Bart Van Assche
On 07/03/13 20:57, David Dillow wrote: And I'm getting the strong sense that the answer to my question about fast_io_fail_tmo = 0 when dev_loss_tmo is that we should not allow that combination, even if it doesn't break the kernel. If it doesn't make sense, there is no reason to create an

Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-04 Thread Bart Van Assche
On 07/04/13 10:01, Bart Van Assche wrote: On 07/03/13 20:57, David Dillow wrote: And I'm getting the strong sense that the answer to my question about fast_io_fail_tmo = 0 when dev_loss_tmo is that we should not allow that combination, even if it doesn't break the kernel. If it doesn't make

[PATCH v14 0/6] SCSI device removal fixes

2013-07-05 Thread Bart Van Assche
Fix a few issues related to SCSI device removal: - Fix a race between starved list processing and device removal that can trigger a kernel oops. - Avoid that __scsi_remove_device() is called twice for the same SCSI device, which also can cause a kernel oops. - Restrict the SCSI device state

[PATCH v14 2/6] Avoid calling __scsi_remove_device() twice

2013-07-05 Thread Bart Van Assche
in this case the transition into state SDEV_DEL occurs. This avoids that __scsi_remove_device() can get invoked a second time by scsi_forget_host() if this last function is invoked from another thread than the thread that performs LUN scanning. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: James

[PATCH v14 1/6] Fix race between starved list and device removal

2013-07-05 Thread Bart Van Assche
on the queue while running it. Signed-off-by: James Bottomley jbottom...@parallels.com Signed-off-by: Bart Van Assche bvanass...@acm.org Reported-by: Chanho Min chanho@lge.com Reference: http://lkml.org/lkml/2012/8/2/96 Cc: Mike Christie micha...@cs.wisc.edu Cc: Hannes Reinecke h...@suse.de Cc

[PATCH v14 3/6] Introduce scsi_device_being_removed()

2013-07-05 Thread Bart Van Assche
Introduce the helper function scsi_device_being_removed(). This patch does not change any functionality. Signed-off-by: Bart Van Assche bvanass...@acm.org Acked-by: Hannes Reinecke h...@suse.de Cc: James Bottomley jbottom...@parallels.com Cc: Mike Christie micha...@cs.wisc.edu --- drivers/scsi

[PATCH v14 4/6] Rework scsi_internal_device_unblock()

2013-07-05 Thread Bart Van Assche
instead of being rejected. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: James Bottomley jbottom...@parallels.com Cc: Mike Christie micha...@cs.wisc.edu Cc: Hannes Reinecke h...@suse.de --- drivers/scsi/scsi_lib.c | 29 +++-- 1 file changed, 11 insertions(+), 18

  1   2   3   4   5   6   7   8   9   10   >