Re: [PATCHv2 0/7] Limit overall SCSI EH runtime

2013-07-02 Thread James Bottomley
On Mon, 2013-07-01 at 16:55 -0400, Jörn Engel wrote:
 On Mon, 1 July 2013 19:23:25 +, James Bottomley wrote:
  On Mon, 2013-07-01 at 13:44 -0400, Jörn Engel wrote:
   If a single device is bad, don't ever do a host
   reset.
  
  This isn't a tenable position.  Sometimes a device looks bad because the
  host state for it has gone insane.  At that point, the only safe action
  is a reset of the host to sane state.
  
  I could be persuaded that you should never do the transport equivalent
  of a bus reset (on non-SPI transports, at least), which is actually hard
  to do on some of the modern transports, but I don't think you can get
  away without having a host reset in the eh arsenal.
 
 Fair enough.  Hardware being hardware and hardware bugs being hard to
 fix, I see your point.
 
 However, we shouldn't screw the poor user who has paid a premium for a
 second HBA to get some redundancy and reset both of them at the same
 time.  That would, you know, defeat the redundancy. ;)

I don't understand what you're getting at.  In a dual HBA situation,
whether the second HBA is implicated or not depends on configuration and
what the first HBA is doing. If it's just passively lost device state,
then the second HBA should continue just fine.  If the insane HBA is
injecting rogue data on the bus then, in a properly isolated
configuration, it shouldn't be able to affect the second HBA, but if
there's some leak and it does, chances are error handling will occur on
both simultaneously.  I don't see any way to avoid this other than
having the user buy better hardware and properly configure it.

James

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Linux boot Support for 4KB sector drives ?

2013-07-02 Thread Kishore Babu Lukka
Adding Asha also.

-Original Message-
From: Mahesh Rajashekhara 
Sent: Monday, July 01, 2013 11:41 AM
To: jbottom...@parallels.com; linux-scsi@vger.kernel.org
Cc: Tony Ruiz; Achim Leubner; Mahesh Rajashekhara; Kishore Babu Lukka
Subject: Linux boot Support for 4KB sector drives ?

Hello,

Does any of the Linux OS flavors support booting from the 4K sector
(advanced format) drive  in legacy BIOS mode (MBR partitioning scheme) ?

Thanks  Regards,
Mahesh


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux boot Support for 4KB sector drives ?

2013-07-02 Thread Aaron Lu
On 07/02/2013 03:11 PM, Kishore Babu Lukka wrote:
 Adding Asha also.
 
 -Original Message-
 From: Mahesh Rajashekhara 
 Sent: Monday, July 01, 2013 11:41 AM
 To: jbottom...@parallels.com; linux-scsi@vger.kernel.org
 Cc: Tony Ruiz; Achim Leubner; Mahesh Rajashekhara; Kishore Babu Lukka
 Subject: Linux boot Support for 4KB sector drives ?
 
 Hello,
 
 Does any of the Linux OS flavors support booting from the 4K sector
 (advanced format) drive  in legacy BIOS mode (MBR partitioning scheme) ?

That depends on boot loader, not Linux I think.
Linux has support for 4k sector drive, but if the boot loader
doesn't, it can't fetch the kernel into memory and load Linux.

Legacy grub makes use of BIOS interrupt service and thus shouldn't
be able to support 4k sector drive, I don't know the status of grub2.

Thanks,
Aaron

 
 Thanks  Regards,
 Mahesh
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-scsi in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v13 0/4] SCSI device removal fixes

2013-07-02 Thread Bart Van Assche

Fix a few issues related to SCSI device removal:
- Fix a race between starved list processing and device removal that
  can trigger a kernel oops.
- Avoid that __scsi_remove_device() is called twice for the same SCSI
  device, which also can cause a kernel oops.
- Restrict the SCSI device state changes allowed via sysfs.
- Avoid re-enabling I/O after the transport layer became offline.

Changes compared to v12:
- Clarified the description of the patch for handling a transport
  layer failure during LUN scanning: mentioned that this patch was
  developed after analyzing the cause of a kernel oops triggered by
  asynchronous LUN scanning.
- Restored the previous version of the patch for restricting sysfs
  SCSI device state changes, namely the version that only disallows
  changing the device state into cancel or deleted.
- Added a comment in patch 4/4.
- Left out a patch that was the result of source reading and also
  an intermediate patch that is no longer needed in this series.

Changes compared to v11:
- Left out a patch that was not a device removal bug fix.
- Left out the patches about which there is not yet an agreement.

Changes compared to v10:
- Rebased and retested on top of Linux kernel v3.10-rc5.

Changes compared to v9:
- Changed one WARN_ON() statement into a WARN() statement.

Changes compared to v8:
- Addressed the feedback from Joe Lawrence - dropped the patch that
  makes scsi_remove_host() wait until the last sdev user is gone.
- Eliminated Scsi_Host.tmf_in_progress since it duplicates state
  information available in Scsi_Host.eh_active.
- Added a patch to avoid reenabling I/O after the transport layer
  became offline.

Changes compared to v7:
- Addressed the review comments posted by Hannes Reinecke and Rolf Eike
  Beer.
- Modified patch Make scsi_remove_host() wait until error handling
  finished such that it is also safe for SCSI timeout values below
  the maximum LLD response time by modifying scsi_send_eh_cmnd() such
  that it does not invoke any LLD code after scsi_remove_host() started.
- Added a patch to save and restore the host_scribble field.
- Refined / clarified several patch descriptions.
- Rebased and retested on top of kernel v3.8-rc6.

Changes compared to v6:
- Dropped the first six patches since Jens queued these for 3.8.
- Added patch to avoid that __scsi_remove_device() is invoked twice.
- Restore error recovery in the SHOST_CANCEL state.

Changes compared to v5:
- Avoid that block layer work can be scheduled on a dead queue.
- Do not invoke any SCSI LLD callback after scsi_remove_host() finished.
- Stop error handling as soon as scsi_remove_host() started.
- Remove the unused function bsg_goose_queue().
- Avoid that scsi_device_set_state() triggers a race condition.

Changes compared to v4:
- Moved queue_flag_set(QUEUE_FLAG_DEAD, q) from blk_drain_queue() into
  blk_cleanup_queue().
- Declared the new __blk_run_queue_uncond() function inline. Checked in
  the generated assembler code that this function is really inlined in
  __blk_run_queue().
- Elaborated several patch descriptions.
- Added sparse annotations to scsi_request_fn().
- Split several patches.

Changes compared to v3:
- Fixed a race condition by setting QUEUE_FLAG_DEAD earlier.
- Added a patch for fixing a race between starved list processing
  and device removal to this series.

Changes compared to v2:
- Split second patch into two patches.
- Refined patch descriptions.

Changes compared to v1:
- Included a patch to rename QUEUE_FLAG_DEAD.
- Refined the descriptions of the __blk_run_queue_uncond() and
  blk_cleanup_queue() functions.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] Fix race between starved list and device removal

2013-07-02 Thread Bart Van Assche
From: James Bottomley jbottom...@parallels.com

scsi_run_queue() examines all SCSI devices that are present on
the starved list. Since scsi_run_queue() unlocks the SCSI host
lock a SCSI device can get removed after it has been removed
from the starved list and before its queue is run. Protect
against that race condition by holding a reference on the
queue while running it.

Signed-off-by: James Bottomley jbottom...@parallels.com
Signed-off-by: Bart Van Assche bvanass...@acm.org
Reported-by: Chanho Min chanho@lge.com
Reference: http://lkml.org/lkml/2012/8/2/96
Cc: Tejun Heo t...@kernel.org
Cc: Mike Christie micha...@cs.wisc.edu
Cc: Hannes Reinecke h...@suse.de
Cc: sta...@vger.kernel.org
---
 drivers/scsi/scsi_lib.c |   26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 86d5220..df8bd5a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -434,6 +434,8 @@ static void scsi_run_queue(struct request_queue *q)
list_splice_init(shost-starved_list, starved_list);
 
while (!list_empty(starved_list)) {
+   struct request_queue *slq;
+
/*
 * As long as shost is accepting commands and we have
 * starved queues, call blk_run_queue. scsi_request_fn
@@ -456,11 +458,25 @@ static void scsi_run_queue(struct request_queue *q)
continue;
}
 
-   spin_unlock(shost-host_lock);
-   spin_lock(sdev-request_queue-queue_lock);
-   __blk_run_queue(sdev-request_queue);
-   spin_unlock(sdev-request_queue-queue_lock);
-   spin_lock(shost-host_lock);
+   /*
+* Once we drop the host lock, a racing scsi_remove_device()
+* call may remove the sdev from the starved list and destroy
+* it and the queue.  Mitigate by taking a reference to the
+* queue and never touching the sdev again after we drop the
+* host lock.  Note: if __scsi_remove_device() invokes
+* blk_cleanup_queue() before the queue is run from this
+* function then blk_run_queue() will return immediately since
+* blk_cleanup_queue() marks the queue with QUEUE_FLAG_DYING.
+*/
+   slq = sdev-request_queue;
+   if (!blk_get_queue(slq))
+   continue;
+   spin_unlock_irqrestore(shost-host_lock, flags);
+
+   blk_run_queue(slq);
+   blk_put_queue(slq);
+
+   spin_lock_irqsave(shost-host_lock, flags);
}
/* put any unprocessed entries back */
list_splice(starved_list, shost-starved_list);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] Avoid calling __scsi_remove_device() twice

2013-07-02 Thread Bart Van Assche
If something goes wrong during LUN scanning, e.g. a transport layer
failure occurs, then __scsi_remove_device() can get invoked by the
LUN scanning code for a SCSI device in state SDEV_CREATED_BLOCK and
before the SCSI device has been added to sysfs (is_visible == 0).
Make sure that even in this case the transition into state SDEV_DEL
occurs. This avoids that __scsi_remove_device() can get invoked a
second time by scsi_forget_host() if this last function is invoked
from another thread than the thread that performs LUN scanning.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: James Bottomley jbottom...@parallels.com
Cc: Mike Christie micha...@cs.wisc.edu
Cc: Hannes Reinecke h...@suse.de
Cc: Tejun Heo t...@kernel.org
---
 drivers/scsi/scsi_lib.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index df8bd5a..124392f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2193,6 +2193,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum 
scsi_device_state state)
case SDEV_OFFLINE:
case SDEV_TRANSPORT_OFFLINE:
case SDEV_CANCEL:
+   case SDEV_CREATED_BLOCK:
break;
default:
goto illegal;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] Avoid re-enabling I/O after the transport became offline

2013-07-02 Thread Bart Van Assche
Disallow the SDEV_TRANSPORT_OFFLINE to SDEV_CANCEL transition such
that no I/O is sent to devices for which the transport is offline.
Notes:
- Functions like sd_shutdown() use scsi_execute_req() and hence
  set the REQ_PREEMPT flag. Such requests are passed to the LLD
  queuecommand callback in the SDEV_CANCEL state.
- This patch does not affect Fibre Channel LLD drivers since these
  drivers invoke fc_remote_port_chkready() before submitting a SCSI
  request to the HBA. That prevents a timeout to occur in state
  SDEV_CANCEL if the transport is offline.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Mike Christie micha...@cs.wisc.edu
Cc: James Bottomley jbottom...@parallels.com
Cc: Hannes Reinecke h...@suse.de
Cc: Tejun Heo t...@kernel.org
---
 drivers/scsi/scsi_lib.c   |1 -
 drivers/scsi/scsi_sysfs.c |9 -
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 124392f..a0fb56b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2178,7 +2178,6 @@ scsi_device_set_state(struct scsi_device *sdev, enum 
scsi_device_state state)
case SDEV_RUNNING:
case SDEV_QUIESCE:
case SDEV_OFFLINE:
-   case SDEV_TRANSPORT_OFFLINE:
case SDEV_BLOCK:
break;
default:
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 931a7d9..1711617 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -955,7 +955,14 @@ void __scsi_remove_device(struct scsi_device *sdev)
struct device *dev = sdev-sdev_gendev;
 
if (sdev-is_visible) {
-   if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
+   /*
+* The transition from SDEV_TRANSPORT_OFFLINE into
+* SDEV_CANCEL is not allowed since this transition would
+* reenable I/O. However, if the device state was already
+* SDEV_TRANSPORT_OFFLINE, proceed with device removal.
+*/
+   if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0 
+   sdev-sdev_state != SDEV_TRANSPORT_OFFLINE)
return;
 
bsg_unregister_queue(sdev-request_queue);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] Disallow changing the device state via sysfs into deleted

2013-07-02 Thread Bart Van Assche
Changing the state of a SCSI device via sysfs into cancel or
deleted prevents removal of these devices by scsi_remove_host().
Hence do not allow this.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Tejun Heo t...@kernel.org
Cc: James Bottomley jbottom...@parallels.com
Cc: Mike Christie micha...@cs.wisc.edu
Cc: Hannes Reinecke h...@suse.de
Cc: David Milburn dmilb...@redhat.com
---
 drivers/scsi/scsi_sysfs.c |6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 1711617..292df85 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -605,10 +605,8 @@ store_state_field(struct device *dev, struct 
device_attribute *attr,
break;
}
}
-   if (!state)
-   return -EINVAL;
-
-   if (scsi_device_set_state(sdev, state))
+   if (state == 0 || state == SDEV_CANCEL || state == SDEV_DEL ||
+   scsi_device_set_state(sdev, state))
return -EINVAL;
return count;
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] Avoid re-enabling I/O after the transport became offline

2013-07-02 Thread James Bottomley
On Tue, 2013-07-02 at 15:07 +0200, Bart Van Assche wrote:
 Disallow the SDEV_TRANSPORT_OFFLINE to SDEV_CANCEL transition such
 that no I/O is sent to devices for which the transport is offline.
 Notes:
 - Functions like sd_shutdown() use scsi_execute_req() and hence
   set the REQ_PREEMPT flag. Such requests are passed to the LLD
   queuecommand callback in the SDEV_CANCEL state.
 - This patch does not affect Fibre Channel LLD drivers since these
   drivers invoke fc_remote_port_chkready() before submitting a SCSI
   request to the HBA. That prevents a timeout to occur in state
   SDEV_CANCEL if the transport is offline.
 
 Signed-off-by: Bart Van Assche bvanass...@acm.org
 Cc: Mike Christie micha...@cs.wisc.edu
 Cc: James Bottomley jbottom...@parallels.com
 Cc: Hannes Reinecke h...@suse.de
 Cc: Tejun Heo t...@kernel.org
 ---
  drivers/scsi/scsi_lib.c   |1 -
  drivers/scsi/scsi_sysfs.c |9 -
  2 files changed, 8 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
 index 124392f..a0fb56b 100644
 --- a/drivers/scsi/scsi_lib.c
 +++ b/drivers/scsi/scsi_lib.c
 @@ -2178,7 +2178,6 @@ scsi_device_set_state(struct scsi_device *sdev, enum 
 scsi_device_state state)
   case SDEV_RUNNING:
   case SDEV_QUIESCE:
   case SDEV_OFFLINE:
 - case SDEV_TRANSPORT_OFFLINE:
   case SDEV_BLOCK:
   break;
   default:
 diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
 index 931a7d9..1711617 100644
 --- a/drivers/scsi/scsi_sysfs.c
 +++ b/drivers/scsi/scsi_sysfs.c
 @@ -955,7 +955,14 @@ void __scsi_remove_device(struct scsi_device *sdev)
   struct device *dev = sdev-sdev_gendev;
  
   if (sdev-is_visible) {
 - if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
 + /*
 +  * The transition from SDEV_TRANSPORT_OFFLINE into
 +  * SDEV_CANCEL is not allowed since this transition would
 +  * reenable I/O. However, if the device state was already
 +  * SDEV_TRANSPORT_OFFLINE, proceed with device removal.
 +  */
 + if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0 
 + sdev-sdev_state != SDEV_TRANSPORT_OFFLINE)

This isn't the right way to do this, because it's adding uncharted state
to the state model.  What should happen is that this should be reflected
in the actual state model.  It sounds like we need a CANCEL_OFFLINE
state to which TRANSPORT_OFFLINE (and possibly OFFLINE) can transition.

The comment on the transition should state that CANCEL_OFFLINE won't
allow any I/O.

James

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] First round of SCSI updates for the 3.10+ merge window

2013-07-02 Thread James Bottomley
The patch set is mostly driver updates (usf, zfcp, lpfc, mpt2sas,
megaraid_sas, bfa, ipr) and a few bug fixes.  Also of note is that the
Buslogic driver has been rewritten to a better coding style and 64 bit
support added.  We also removed the libsas limitation on 16 bytes for
the command size (currently no drivers make use of this).

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-misc

The short changelog is:

Akinobu Mita (3):
  ufshcd-pltfrm: remove unnecessary dma_set_coherent_mask() call
  ufs: fix register address in UIC error interrupt handling
  ufshcd-pltfrm: add missing empty slot in ufs_of_match[]

Ben Hutchings (1):
  sd: Fix parsing of 'temporary ' cache mode prefix

Dan Carpenter (2):
  fnic: potential dead lock in fnic_is_abts_pending()
  pm80xx: remove unneeded NULL check

Daniel Hansel (1):
  zfcp: fix adapter (re)open recovery while link to SAN is down

Eddie Wai (1):
  libiscsi: Added new boot entries in the session sysfs

Geert Uytterhoeven (1):
  ufs: SCSI_UFSHCD should depend on SCSI_DMA

Hannes Reinecke (1):
  sd: avoid deadlocks when running under multipath

Jakob Normark (1):
  bfa: Fixes for 0-terminated strncpy and possible null pointer dereference

James Bottomley (1):
  libsas: implement  16 byte CDB support

James Georgas (1):
  megaraid: minor cut and paste error fixed.

James Smart (17):
  lpfc 8.3.40: Update lpfc version to driver version 8.3.40
  lpfc 8.3.40: Update Copyrights to 2013 for 8.3.38, 8.3.39, and 8.3.40 
modifications
  lpfc 8.3.40: Fixed a race condition between SLI host and port failed FCF 
rediscovery
  lpfc 8.3.40: Fixed issue mailbox wait routine failed to issue dump memory 
mbox command
  lpfc 8.3.40: Fixed system panic due to unsafe walking and deleting linked 
list
  lpfc 8.3.40: Fixed FCoE connection list vlan identifier and add FCF list 
debug
  lpfc 8.3.40: Clarified the behavior of the lpfc_max_luns module parameter
  lpfc 8.3.40: Fix to allow OCM to report FEC status
  lpfc 8.3.40: Fixed a missing return code in a logging message
  lpfc 8.3.40: Fixed some logging message fields
  lpfc 8.3.40: Fixed list corruption when lpfc_drain_tx runs.
  lpfc 8.3.40: Fix starting reference tag when calculating BG error
  lpfc 8.3.40: Fix inconsistent list removal causes crash.
  lpfc 8.3.40: Fixed system panic during handling unsolicited receive 
buffer error condition
  lpfc 8.3.40: Fix BlockGuard error checking
  lpfc 8.3.40: Fixed crash during FCoE failover testing.
  lpfc 8.3.40: Fix lpfc_used_cpu to be more dynamic

K. Y. Srinivasan (2):
  storvsc: Update the storage protocol to win8 level
  storvsc: Increase the value of scsi timeout for storvsc devices

Karen Xie (1):
  cxgb4i: add support for T5 adapter

Khalid Aziz (3):
  MAINTAINERS: Add myself as the maintainer for BusLogic SCSI driver
  BusLogic: Port driver to 64-bit.
  BusLogic: Fix style issues

Mahesh Rajashekhara (1):
  aacraid: Fix for arrays are going offline in the system. System hangs

Martin K. Petersen (4):
  sd: Update WRITE SAME heuristics
  3w-: Create sense buffer for unsupported commands
  Workaround for disks that report bad optimal transfer length
  Allow error handling timeout to be specified

Martin Peschke (1):
  zfcp: remove access control tables interface

Naresh Kumar Inna (1):
  csiostor: Retain default adapter configuration in absence of config file.

Reddy, Sreekanth (2):
  mpt2sas: fix for unused variable 'event_data' warning
  mpt2sas: Fix for issue Missing delay not getting set during system bootup

Sachin Kamat (1):
  ufs: Remove redundant platform_set_drvdata()

Sebastian Ott (4):
  zfcp: remove unused device_unregister wrapper
  zfcp: cleanup unit sysfs attribute usage
  zfcp: cleanup port sysfs attribute usage
  zfcp: cfdc fops add owner

Sergei Shtylyov (1):
  ipr: qc_fill_rtf() method should not store alternate status register

Seungwon Jeon (5):
  ufs: use devres functions for ufshcd
  ufs: rework link start-up process
  ufs: remove version check before IS reg clear
  ufs: amend interrupt configuration
  ufs: wrap the i/o access operations

Sreekanth Reddy (6):
  mpt2sas: Bump driver vesion to v15.100.00.00
  mpt2sas: Calulate the Reply post queue depth calculation as per the MPI 
spec
  mpt2sas: fix firmware failure with wrong task attribute
  mpt2sas: Fix for device scan following host reset could get stuck in a 
infinite loop
  mpt2sas: Update the timing requirements for issuing a Hard Reset
  mpt2sas: MPI2 Rev W (2.00.15) specification

Steffen Maier (3):
  zfcp: status read buffers on first adapter open with link down
  zfcp: module parameter dbflevel for early debugging
  zfcp: block queue limits with data router

Sujit Reddy Thumma (2):
  ufs: 

Re: [PATCH 1/4] scsi: ufs: Fix broken task management command implementation

2013-07-02 Thread Santosh Y
On Fri, Jun 28, 2013 at 5:02 PM, Sujit Reddy Thumma
sthu...@codeaurora.org wrote:
 On 6/27/2013 4:49 PM, Santosh Y wrote:

 +   spin_lock_irqsave(host-host_lock, flags);
  task_req_descp = hba-utmrdl_base_addr;
  task_req_descp += free_slot;
 
 @@ -2353,38 +2387,39 @@ ufshcd_issue_tm_cmd(struct ufs_hba *hba,
  (struct utp_upiu_task_req *)
  task_req_descp-task_req_upiu;
  task_req_upiup-header.dword_0 =
  UPIU_HEADER_DWORD(UPIU_TRANSACTION_TASK_REQ, 0,
 - lrbp-lun,
  lrbp-task_tag);
 +   lun_id, free_slot);

 Actually it still doesn't fix the problem. The*task tag*  used here

 should be unique across the SCSI/Query and Task Managment UPIUs.


 I am sorry, I didn't get that. Why should it be unique across the
 SCSI/Query? For example, if a machine supports 32 request slots and 8 task
 management slots, then the task management command tag can be anything out
 of 8 slots.


The spec(ufs 1.1) has the requirement under  '10.5.2 Basic Header
Format'-'Task Tag'.
Couple of devices I came across had similar behavior. The tracking of
UPIUs --even belonging to a separate group-- seemed to be based on the
'task tag' value rather than 'type of UPIU'-'task tag'.

-- 
~Santosh
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 0/7] Limit overall SCSI EH runtime

2013-07-02 Thread Jörn Engel
On Tue, 2 July 2013 06:37:05 +, James Bottomley wrote:
 
 I don't understand what you're getting at.  In a dual HBA situation,
 whether the second HBA is implicated or not depends on configuration and
 what the first HBA is doing. If it's just passively lost device state,
 then the second HBA should continue just fine.  If the insane HBA is

If the problem is an insane drive instead of an insane HBA, both HBAs
will be in roughly the same state at roughly the same time - assuming
they both send commands to the insane drive.  If they now go into
error handling and effectively shut off all the sane drives at roughly
the same time, the user is ed.

And we shouldn't require the user to buy better hardware.  The whole
point of a redundant setup is that your plane doesn't crash to the
ground when one of your two engines fails.  If regulations required
perfect engines, you wouldn't be flying to conferences.  They require
decent engines and enough redundancy that any one can fail at any
moment.

Computer systems are no different.  We can construct a robust system
from individually less robust components.  Requiring perfect
components would be ludicrous.  Having a system design where one
faulty component will reliably bring the system down is equally
ludicrous.  Sadly that is also the state of today's scsi stack.

This is not a theoretical problem, btw.  We currently carry some
patches to solve it for us.  They are not applicable for mainline in
their current state - we support a lot less hardware diversity.  But
trust me, we didn't create them on a whim. ;)

Jörn

--
If you're willing to restrict the flexibility of your approach,
you can almost always do something better.
-- John Carmack
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 0/7] Limit overall SCSI EH runtime

2013-07-02 Thread James Bottomley
On Tue, 2013-07-02 at 10:58 -0400, Jörn Engel wrote:
 On Tue, 2 July 2013 06:37:05 +, James Bottomley wrote:
  
  I don't understand what you're getting at.  In a dual HBA situation,
  whether the second HBA is implicated or not depends on configuration and
  what the first HBA is doing. If it's just passively lost device state,
  then the second HBA should continue just fine.  If the insane HBA is
 
 If the problem is an insane drive instead of an insane HBA, both HBAs
 will be in roughly the same state at roughly the same time - assuming
 they both send commands to the insane drive.  If they now go into
 error handling and effectively shut off all the sane drives at roughly
 the same time, the user is ed.

That's handled in device reset, so I don't understand your point.

James

 And we shouldn't require the user to buy better hardware.  The whole
 point of a redundant setup is that your plane doesn't crash to the
 ground when one of your two engines fails.  If regulations required
 perfect engines, you wouldn't be flying to conferences.  They require
 decent engines and enough redundancy that any one can fail at any
 moment.
 
 Computer systems are no different.  We can construct a robust system
 from individually less robust components.  Requiring perfect
 components would be ludicrous.  Having a system design where one
 faulty component will reliably bring the system down is equally
 ludicrous.  Sadly that is also the state of today's scsi stack.
 
 This is not a theoretical problem, btw.  We currently carry some
 patches to solve it for us.  They are not applicable for mainline in
 their current state - we support a lot less hardware diversity.  But
 trust me, we didn't create them on a whim. ;)


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 0/7] Limit overall SCSI EH runtime

2013-07-02 Thread Jörn Engel
On Tue, 2 July 2013 16:33:40 +, James Bottomley wrote:
 On Tue, 2013-07-02 at 10:58 -0400, Jörn Engel wrote:
  On Tue, 2 July 2013 06:37:05 +, James Bottomley wrote:
   
   I don't understand what you're getting at.  In a dual HBA situation,
   whether the second HBA is implicated or not depends on configuration and
   what the first HBA is doing. If it's just passively lost device state,
   then the second HBA should continue just fine.  If the insane HBA is
  
  If the problem is an insane drive instead of an insane HBA, both HBAs
  will be in roughly the same state at roughly the same time - assuming
  they both send commands to the insane drive.  If they now go into
  error handling and effectively shut off all the sane drives at roughly
  the same time, the user is ed.
 
 That's handled in device reset, so I don't understand your point.

Doesn't a device reset require all IO to the entire HBA to stop?
Hannes patches fixed that for aborts, but not for device reset yet,
afaics.

Jörn

--
In America you can have either a flimsy box banged together out of two
by fours and drywall, or a McMansion -- a flimsy box banged together
out of two by fours and drywall, but larger, more dramatic-looking,
and full of expensive fittings.
-- Paul Graham
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 59601] commit 97dec564fd4948e0e560869c80b76e166ca2a83e breaks communication with XYRATEX disk shelves

2013-07-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=59601





--- Comment #7 from Jack Hill jackh...@jackhill.us  2013-07-02 22:13:28 ---
Created an attachment (id=106661)
 -- (https://bugzilla.kernel.org/attachment.cgi?id=106661)
dmesg output with packet dumps

I have attached the dmesg output after applying the patch you provided.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 59601] commit 97dec564fd4948e0e560869c80b76e166ca2a83e breaks communication with XYRATEX disk shelves

2013-07-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=59601





--- Comment #8 from Jack Hill jackh...@jackhill.us  2013-07-02 22:15:42 ---
Also, I think the commit that I claimed introduced the problem after my bisect
run was the wrong one, it appears to be the last good commit. I think the one
that introduces the bug is ff2fc42e74e43721310bff710416230aae6ce0b9

Sorry about that,
Jack

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 59301] mpt2sas0 fails after short time of work

2013-07-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=59301


Marian Marinov m...@yuhu.biz changed:

   What|Removed |Added

 CC||m...@yuhu.biz




--- Comment #2 from Marian Marinov m...@yuhu.biz  2013-07-03 00:35:41 ---
I have similar problem with CentOS 6.4:

Jul  3 01:03:14 BlackPearl kernel: mpt2sas0: _base_fault_reset_work : SAS host
is non-operational 
Jul  3 01:03:15 BlackPearl kernel: mpt2sas0: _base_fault_reset_work : SAS host
is non-operational 
Jul  3 01:03:16 BlackPearl kernel: mpt2sas0: _base_fault_reset_work : SAS host
is non-operational 
Jul  3 01:03:17 BlackPearl kernel: mpt2sas0: _base_fault_reset_work : SAS host
is non-operational 
Jul  3 01:03:18 BlackPearl kernel: mpt2sas0: _base_fault_reset_work : SAS host
is non-operational 
Jul  3 01:03:19 BlackPearl kernel: mpt2sas0: _base_fault_reset_work : SAS host
is non-operational 
Jul  3 01:03:19 BlackPearl kernel: mpt2sas0: _base_fault_reset_work: Running
mpt2sas_dead_ioc thread success 
Jul  3 01:03:19 BlackPearl kernel: mpt2sas0: IR shutdown (sending)
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Unhandled error code
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] CDB: Write(10): 2a 00 07
55 40 28 00 00 08 00
Jul  3 01:03:19 BlackPearl kernel: Buffer I/O error on device dm-2, logical
block 9089541
Jul  3 01:03:19 BlackPearl kernel: lost page write due to I/O error on dm-2
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Unhandled error code
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] CDB: Write(10): 2a 00 03
01 1f 48 00 00 08 00
Jul  3 01:03:19 BlackPearl kernel: Buffer I/O error on device dm-2, logical
block 11753
Jul  3 01:03:19 BlackPearl kernel: lost page write due to I/O error on dm-2
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Unhandled error code
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] CDB: Write(10): 2a 00 06
04 06 98 00 00 18 00
Jul  3 01:03:19 BlackPearl kernel: JBD2: Detected IO errors while flushing file
data on dm-2-8
Jul  3 01:03:19 BlackPearl kernel: Aborting journal on device dm-2-8.
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Unhandled error code
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] CDB: Write(10): 2a 00 06
03 b0 00 00 00 08 00
Jul  3 01:03:19 BlackPearl kernel: Buffer I/O error on device dm-2, logical
block 6324224
Jul  3 01:03:19 BlackPearl kernel: lost page write due to I/O error on dm-2
Jul  3 01:03:19 BlackPearl kernel: JBD2: I/O error detected when updating
journal superblock for dm-2-8.
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Unhandled error code
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:1:0: [sdc] CDB: Write(10): 2a 00 07
55 40 28 00 00 08 00
Jul  3 01:03:19 BlackPearl kernel: Buffer I/O error on device dm-2, logical
block 9089541
Jul  3 01:03:19 BlackPearl kernel: lost page write due to I/O error on dm-2
Jul  3 01:03:19 BlackPearl kernel: JBD2: Detected IO errors while flushing file
data on dm-2-8
Jul  3 01:03:19 BlackPearl kernel: EXT4-fs error (device dm-2):
ext4_journal_start_sb: Detected aborted journal
Jul  3 01:03:19 BlackPearl kernel: EXT4-fs (dm-2): Remounting filesystem
read-only
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:0:0: [sdb] Unhandled error code
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:0:0: [sdb] Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:0:0: [sdb] CDB: Write(10): 2a 00 02
d2 bc d8 00 00 08 00
Jul  3 01:03:19 BlackPearl kernel: Buffer I/O error on device sdb2, logical
block 5792147
Jul  3 01:03:19 BlackPearl kernel: lost page write due to I/O error on sdb2
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:0:0: [sdb] Unhandled error code
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:0:0: [sdb] Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:0:0: [sdb] CDB: Write(10): 2a 00 02
d2 bc b8 00 00 08 00
Jul  3 01:03:19 BlackPearl kernel: Buffer I/O error on device sdb2, logical
block 5792143
Jul  3 01:03:19 BlackPearl kernel: lost page write due to I/O error on sdb2
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:0:0: [sdb] Unhandled error code
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:0:0: [sdb] Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jul  3 01:03:19 BlackPearl kernel: sd 0:0:0:0: [sdb] CDB: Write(10): 2a 00 02
d2 bc 90 00 00 08 00
Jul  3 

[Bug 59301] mpt2sas0 fails after short time of work

2013-07-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=59301





--- Comment #3 from Marian Marinov m...@yuhu.biz  2013-07-03 00:36:39 ---
[root@BlackPearl ~]# uname -a
Linux BlackPearl.yuhu.biz 2.6.32-358.11.1.el6.i686 #1 SMP Wed Jun 12 01:01:27
UTC 2013 i686 i686 i386 GNU/Linux
[root@BlackPearl ~]# cat /etc/redhat-release 
CentOS release 6.4 (Final)

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 3/4] scsi: ufs: Fix device and host reset methods

2013-07-02 Thread Santosh Y
 +
 +/**
 + * ufshcd_eh_device_reset_handler - device reset handler registered to
 + *scsi layer.
 + * @cmd - SCSI command pointer
 + *
 + * Returns SUCCESS/FAILED
 + */
 +static int ufshcd_eh_device_reset_handler(struct scsi_cmnd *cmd)
 +{
 +   struct ufs_hba *hba;
 +   int err;
 +   unsigned long flags;
 +
 +   hba = shost_priv(cmd-device-host);
 +
 +   spin_lock_irqsave(hba-host-host_lock, flags);
 +   if (hba-ufshcd_state == UFSHCD_STATE_RESET) {
 +   dev_warn(hba-dev, %s: reset in progress\n, __func__);
 +   err = SUCCESS;
 +   spin_unlock_irqrestore(hba-host-host_lock, flags);
 +   goto out;

It is better to wait here until the state changes to 'operational' or
'error' before returning success.

 +   }
 +
 +   hba-ufshcd_state = UFSHCD_STATE_RESET;
 +   ufshcd_set_device_reset_pending(hba);
 +   spin_unlock_irqrestore(hba-host-host_lock, flags);
 +
 +   err = ufshcd_reset_and_restore(hba);
 +
 +   spin_lock_irqsave(hba-host-host_lock, flags);
 +   if (!err) {
 +   err = SUCCESS;
 +   hba-ufshcd_state = UFSHCD_STATE_OPERATIONAL;
 +   } else {
 +   err = FAILED;
 +   hba-ufshcd_state = UFSHCD_STATE_ERROR;
 +   }
 +   spin_unlock_irqrestore(hba-host-host_lock, flags);
 +out:
 +   return err;
 +}
 +
 +/**
 + * ufshcd_eh_host_reset_handler - host reset handler registered to scsi layer
 + * @cmd - SCSI command pointer
 + *
 + * Returns SUCCESS/FAILED
 + */
 +static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd)
 +{
 +   struct ufs_hba *hba;
 +   int err;
 +   unsigned long flags;
 +
 +   hba = shost_priv(cmd-device-host);
 +
 +   spin_lock_irqsave(hba-host-host_lock, flags);
 +   if (hba-ufshcd_state == UFSHCD_STATE_RESET) {
 +   dev_warn(hba-dev, %s: reset in progress\n, __func__);
 +   err = SUCCESS;
 +   spin_unlock_irqrestore(hba-host-host_lock, flags);
 +   goto out;

same in this case also.

 +   }
 +
 +   hba-ufshcd_state = UFSHCD_STATE_RESET;
 +   ufshcd_set_host_reset_pending(hba);
 +   spin_unlock_irqrestore(hba-host-host_lock, flags);
 +
 +   err = ufshcd_reset_and_restore(hba);
 +
 +   spin_lock_irqsave(hba-host-host_lock, flags);
 +   if (!err) {
 +   err = SUCCESS;
 +   hba-ufshcd_state = UFSHCD_STATE_OPERATIONAL;
 +   } else {
 +   err = FAILED;
 +   hba-ufshcd_state = UFSHCD_STATE_ERROR;
 +   }
 +   spin_unlock_irqrestore(hba-host-host_lock, flags);
 +out:
 +   return err;
 +}
 +
 +/**

-- 
~Santosh
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html