[Kernel-packages] [Bug 1977508] Re: [UBUNTU 20.04] zfcp: fix failed recovery on gone remote port, non-NPIV FCP dev

Frank Heimes Wed, 08 Jun 2022 23:50:56 -0700

I'm glad to see that the commit:
8c9db6679be4 8c9db6679be4348b8aae108e11d4be2f83976e30 "scsi: zfcp: Fix failed 
recovery on gone remote port with non-NPIV FCP devices"
got tagged for stable update:
"Cc: <[email protected]> #2.6.32+"
With that, such commits will automatically be picked up by the Ubuntu kernel 
teams
"<Focal> update: vx.x.xxx upstream stable release" process.


And this already happend for this commit "scsi: zfcp: Fix failed recovery on 
gone remote port with non-NPIV FCP devices" and it landed as:
c1f8d188be1a in focal and is included in "Ubuntu-5.4.0-108.122" and newer 
kernels,
58d489451e99 in impish and is included in "Ubuntu-5.13.0-40.45" and newer 
kernels,
and as
a9bcb8cc29d2 in jammy and is included in "Ubuntu-5.15.0-20.20" and newer 
kernels.

And since the commit got upstream accepted with 5.17, it will
automatically be part of the planned kinetic kernel (5.19).

With this I'm closing this bug as Fix Released.

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Impish)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Focal)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu)
       Status: New => Invalid

** Changed in: linux (Ubuntu Jammy)
       Status: New => Fix Released

** Changed in: linux (Ubuntu Impish)
       Status: New => Fix Released

** Changed in: linux (Ubuntu Focal)
       Status: New => Fix Released

** Changed in: ubuntu-z-systems
       Status: New => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1977508

Title:
  [UBUNTU 20.04] zfcp: fix failed recovery on gone remote port, non-NPIV
  FCP dev

Status in Ubuntu on IBM z Systems:
  Fix Released
Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Focal:
  Fix Released
Status in linux source package in Impish:
  Fix Released
Status in linux source package in Jammy:
  Fix Released

Bug description:
  Description:   zfcp: fix failed recovery on gone remote port, non-NPIV
  FCP dev

  Symptom:       With non-NPIV FCP devices, failed recovery on gone remote port.
                 As follow-on error, failed paths after storage target NPIV
                 failover with IBM FCP storage based on Spectrum Virtualize such
                 as FlashSystem 9200, V7000, SAN Volume Controller, etc.

  Problem:       Suppose we have an environment with a number of non-NPIV FCP
                 devices (virtual HBAs / FCP devices / zfcp "adapter"s) sharing
                 the same physical FCP channel (HBA port) and its I_T nexus. 
Plus
                 a number of storage target ports zoned to such shared channel.
                 Now one target port logs out of the fabric causing an RSCN. 
Zfcp
                 reacts with an ADISC ELS and subsequent port recovery depending
                 on the ADISC result. This happens on all such FCP devices (in
                 different Linux images) concurrently as they all receive a copy
                 of this RSCN. In the following we look at one of those FCP
                 devices.
                 Requests other than FSF_QTCB_FCP_CMND can be slow until they 
get
                 a response.
                 Depending on which requests are affected by slow responses,
                 there are different recovery outcomes.

  Solution:      Here we want to fix failed recoveries on port or adapter level
                 by avoiding recovery requests that can be slow.

                 We need the cached N_Port_ID for the remote port "link" test
                 with ADISC. Just before sending the ADISC, we now intentionally
                 forget the old cached N_Port_ID. The idea is that on receiving
                 an RSCN for a port, we have to assume that any cached
                 information about this port is stale. This forces a fresh new
                 GID_PN [FC-GS] nameserver lookup on any subsequent recovery for
                 the same port. Since we typically can still communicate with 
the
                 nameserver efficiently, we now reach steady state quicker:
                 Either the nameserver still does not know about the port so we
                 stop recovery, or the nameserver already knows the port
                 potentially with a new N_Port_ID and we can successfully and
                 quickly perform open port recovery. For the one case, where
                 ADISC returns successfully, we re-initialize port->d_id because
                 that case does not involve any port recovery.

                 This also solves a problem if the storage WWPN quickly logs 
into
                 the fabric again but with a different N_Port_ID. Such as on
                 virtual WWPN takeover during target NPIV failover.
                 [https://www.redbooks.ibm.com/abstracts/redp5477.html] In that
                 case the RSCN from the storage FDISC was ignored by zfcp and we
                 could not successfully recover the failover. On some later
                 failback on the storage, we could have been lucky if the 
virtual
                 WWPN got the same old N_Port_ID from the SAN switch as we still
                 had cached. Then the related RSCN triggered a successful port
                 reopen recovery. However, there is no guarantee to get the same
                 N_Port_ID on NPIV FDISC.

                 Even though NPIV-enabled FCP devices are not affected by this
                 problem, this code change optimizes recovery time for gone
                 remote ports as a side effect. The timely drop of cached
                 N_Port_IDs prevents unnecessary slow open port attempts.

  Reproduction:  With a sufficiently shared FCP channel with non-NPIV FCP devs,
                 perform SAN switch port disable on the storage side,
                 or trigger storage target NPIV failover such as with
                 Spectrum Virtualize CLI command "stopsystem -node 2".

  Upstream-ID:   8c9db6679be4348b8aae108e11d4be2f83976e30

  Master-BZ-ID:  196197

  Distros:       Ubuntu 20.04
                 Ubuntu 21.10
                 Ubuntu 22.04

  Author:        <[email protected]>
  Component:     kernel

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1977508/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1977508] Re: [UBUNTU 20.04] zfcp: fix failed recovery on gone remote port, non-NPIV FCP dev

Reply via email to