------- Comment From bbl...@de.ibm.com 2020-07-14 07:36 EDT-------
I did the same test coverage for this (lp1887124 - 5.4.0-42.46) as before on my 
own test kernel; so I/O (+Verification) with DIV, with DIV+DIX, without both; 
Regression-Test including ports in Switched and P-t-P topologie, with Local- 
and Remote-Port Cable-Pulls, Injects on CHPID- and Linux-Level; all with and 
without I/O (+Verification). Results have been the same - no errors I can see.

>From that standpoint it looks good.

I have not however done any long-term workloads, or system-integration
tests, or such. That would be up to @thorsten.di...@de.ibm.com and our
Distribution/System Test to decide whether this is necessary/wanted.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1887124

Title:
  [UBUNTU 20.04] DIF and DIX support in zfcp (s390x) is broken and the
  kernel crashes unconditionally

Status in Ubuntu on IBM z Systems:
  Triaged
Status in linux package in Ubuntu:
  New

Bug description:
  So, after this is now all upstream in mainline and I found some time
  to test this properly on Ubuntu, here is the backport request.

  Some time ago we noticed - Fedor Loshakov did - that our DIF and DIX support 
in
  zfcp broke at some point (by "broke" I mean, the kernel will unconditionally
  crash if one activates either DIF, or DIX, and attaches *any* LU). I tracked
  that down to a commit made for v5.4 (737eb78e82d5), but we didn't notice it
  back than, because our CI doesn't currently run with either DIF, nor DIX
  enabled (time allowing this is something we want to improve so we catch stuff
  like this earlier). It also turned out that the commit in v5.4 was not really
  the root-cause, and was only making the problem visible more easy.

  In short: zfcp used to allocate/add the shost object for a HBA before
  knowing all the HBA's capabilities, and we later patched the shost
  object to make more of the capabilities known - including the protection
  capabilities. Back when we still had the old blk queue, this worked
  fine; after scsi_mod switched to blk-mq and because requests are now
  all allocated during allocation time of the blk-mq tag-set, this doesn't
  work anymore. Changes we make later to the protection capabilities don't
  get reflected into the tag-set's requests, and they are missing parts.
  When we then try to send I/O, scsi_mod tries to access the protection
  payload data, who are not there, and it crashes the kernel.

  So instead, I now want to allocate/add the shost object for a HBA
  after we know all of its base capabilities. This solves the bug.

  Because we had this modus operandi for a very long time, I had to touch
  many places that assume the shost object was already allocated -
  explaining the rather big patchset for a 'fix'. And because this also
  involves/depends on code that went upstream in v5.5 and v5.7 we now have
  a rather complicated situation for backports of the fix. Nothing "just"
  applies.

  The easiest and most straight forward way to deal with that is to basically
  backport most everything that is involved - which is most of the stuff
  that went upstream since v5.5 for our driver.

  I complied a list of the upstream commits that would have to be picked
  in order to be merge-conflict free:

      92953c6e0aa77 scsi: zfcp: signal incomplete or error for sync exchange 
config/port data
      7e418833e6894 scsi: zfcp: diagnostics buffer caching and use for exchange 
port data
      088210233e6fc scsi: zfcp: add diagnostics buffer for exchange config data
      a10a61e807b0a scsi: zfcp: support retrieval of SFP Data via Exchange Port 
Data
      6028f7c4cd87c scsi: zfcp: introduce sysfs interface for diagnostics of 
local SFP transceiver
      8155eb0785279 scsi: zfcp: implicitly refresh port-data diagnostics when 
reading sysfs
      5a2876f0d1ef2 scsi: zfcp: introduce sysfs interface to read the local 
B2B-Credit
      8a72db70b5ca3 scsi: zfcp: implicitly refresh config-data diagnostics when 
reading sysfs
      48910f8c35cfd scsi: zfcp: move maximum age of diagnostic buffers into a 
per-adapter variable
      e76acc5194264 scsi: zfcp: proper indentation to reduce confusion in 
zfcp_erp_required_act
      a3fd4bfe85fbb scsi: zfcp: fix wrong data and display format of SFP+ 
temperature
      e05a10a055098 scsi: zfcp: expose fabric name as common fc_host sysfs 
attribute
      538c6e910baea scsi: zfcp: wire previously driver-specific sysfs 
attributes also to fc_host
      7e0e4e0958ef7 scsi: zfcp: fix fc_host attributes that should be unknown 
on local link down
      185f2d2d595c2 scsi: zfcp: auto variables for dereferenced structs in open 
port handler
      a17c78460093a scsi: zfcp: report FC Endpoint Security in sysfs
      f0d26ae847489 scsi: zfcp: log FC Endpoint Security of connections
      616da39e0060f scsi: zfcp: trace FC Endpoint Security of FCP devices and 
connections
      e53d92856e9f1 scsi: zfcp: enhance handling of FC Endpoint Security errors
      42cabdaf103be scsi: zfcp: log FC Endpoint Security errors
      cec9cbac5244b scsi: zfcp: use fallthrough;
      978857c7e367d scsi: zfcp: Move shost modification after QDIO (re-)open 
into fenced function
      bd1684817d7d8 scsi: zfcp: Move shost updates during xconfig data handling 
into fenced function
      52e61fde5ec95 scsi: zfcp: Move fc_host updates during xport data handling 
into fenced function
      990486f3a8508 scsi: zfcp: Fence fc_host updates during link-down handling
      ac007adc4d2d9 scsi: zfcp: Move p-t-p port allocation to after xport data
      971f2abb4ca40 scsi: zfcp: Fence adapter status propagation for common 
statuses
      71159b6ecb067 scsi: zfcp: Fence early sysfs interfaces for accesses of 
shost objects
      d0dff2ac98dd4 scsi: zfcp: Move allocation of the shost object to after 
xconf- and xport-data

  I test this with the kernel you provide @
  git://kernel.ubuntu.com/ubuntu/ubuntu-focal.git, added Linus' tree as
  secondary remote (@
  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git) and
  ran:

      git cherry-pick 92953c6e0aa7~1..e76acc519426
      git cherry-pick a3fd4bfe85fb
      git cherry-pick e05a10a05509~1..42cabdaf103b
      git cherry-pick cec9cbac5244
      git cherry-pick 978857c7e367~1..d0dff2ac98dd

  That worked without a hitch on top of tag "Ubuntu-5.4.0-41.45"

  I tested this then by building an Ubuntu distribution kernel
  (unsigned) on level 5.4.0-41.45 with the patches from above applied. I
  ran our regression-suite with DIF/DIX/NONE; with I/O, and local/remote
  cable pulls, switched and p-t-p. Everything worked fine for me.

  So I'm positive that this should work just fine.

  If you don't want to pull all these commits it'll get complicated; I
  gave it a look some time ago if there was a smaller changeset possible
  without/with minimal changes, but found that we would have to
  touch/change several patches to make them apply properly and not have
  any regressions. So I would prefer this. It would also make future
  stable backports easier.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1887124/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to