You have been subscribed to a public bug:
So, after this is now all upstream in mainline and I found some time to
test this properly on Ubuntu, here is the backport request.
Some time ago we noticed - Fedor Loshakov did - that our DIV and DIX support in
zfcp broke at some point (by "broke" I mean, the kernel will unconditionally
crash if one activates either DIF, or DIX, and attaches *any* LU). I tracked
that down to a commit made for v5.4 (737eb78e82d5), but we didn't notice it
back than, because our CI doesn't currently run with either DIV, nor DIX
enabled (time allowing this is something we want to improve so we catch stuff
like this earlier). It also turned out that the commit in v5.4 was not really
the root-cause, and was only making the problem visible more easy.
In short: zfcp used to allocate/add the shost object for a HBA before
knowing all the HBA's capabilities, and we later patched the shost
object to make more of the capabilities known - including the protection
capabilities. Back when we still had the old blk queue, this worked
fine; after scsi_mod switched to blk-mq and because requests are now
all allocated during allocation time of the blk-mq tag-set, this doesn't
work anymore. Changes we make later to the protection capabilities don't
get reflected into the tag-set's requests, and they are missing parts.
When we then try to send I/O, scsi_mod tries to access the protection
payload data, who are not there, and it crashes the kernel.
So instead, I now want to allocate/add the shost object for a HBA
after we know all of its base capabilities. This solves the bug.
Because we had this modus operandi for a very long time, I had to touch
many places that assume the shost object was already allocated -
explaining the rather big patchset for a 'fix'. And because this also
involves/depends on code that went upstream in v5.5 and v5.7 we now have
a rather complicated situation for backports of the fix. Nothing "just"
applies.
The easiest and most straight forward way to deal with that is to basically
backport most everything that is involved - which is most of the stuff
that went upstream since v5.5 for our driver.
I complied a list of the upstream commits that would have to be picked
in order to be merge-conflict free:
92953c6e0aa77 scsi: zfcp: signal incomplete or error for sync exchange
config/port data
7e418833e6894 scsi: zfcp: diagnostics buffer caching and use for exchange
port data
088210233e6fc scsi: zfcp: add diagnostics buffer for exchange config data
a10a61e807b0a scsi: zfcp: support retrieval of SFP Data via Exchange Port
Data
6028f7c4cd87c scsi: zfcp: introduce sysfs interface for diagnostics of
local SFP transceiver
8155eb0785279 scsi: zfcp: implicitly refresh port-data diagnostics when
reading sysfs
5a2876f0d1ef2 scsi: zfcp: introduce sysfs interface to read the local
B2B-Credit
8a72db70b5ca3 scsi: zfcp: implicitly refresh config-data diagnostics when
reading sysfs
48910f8c35cfd scsi: zfcp: move maximum age of diagnostic buffers into a
per-adapter variable
e76acc5194264 scsi: zfcp: proper indentation to reduce confusion in
zfcp_erp_required_act
a3fd4bfe85fbb scsi: zfcp: fix wrong data and display format of SFP+
temperature
e05a10a055098 scsi: zfcp: expose fabric name as common fc_host sysfs
attribute
538c6e910baea scsi: zfcp: wire previously driver-specific sysfs attributes
also to fc_host
7e0e4e0958ef7 scsi: zfcp: fix fc_host attributes that should be unknown on
local link down
185f2d2d595c2 scsi: zfcp: auto variables for dereferenced structs in open
port handler
a17c78460093a scsi: zfcp: report FC Endpoint Security in sysfs
f0d26ae847489 scsi: zfcp: log FC Endpoint Security of connections
616da39e0060f scsi: zfcp: trace FC Endpoint Security of FCP devices and
connections
e53d92856e9f1 scsi: zfcp: enhance handling of FC Endpoint Security errors
42cabdaf103be scsi: zfcp: log FC Endpoint Security errors
cec9cbac5244b scsi: zfcp: use fallthrough;
978857c7e367d scsi: zfcp: Move shost modification after QDIO (re-)open into
fenced function
bd1684817d7d8 scsi: zfcp: Move shost updates during xconfig data handling
into fenced function
52e61fde5ec95 scsi: zfcp: Move fc_host updates during xport data handling
into fenced function
990486f3a8508 scsi: zfcp: Fence fc_host updates during link-down handling
ac007adc4d2d9 scsi: zfcp: Move p-t-p port allocation to after xport data
971f2abb4ca40 scsi: zfcp: Fence adapter status propagation for common
statuses
71159b6ecb067 scsi: zfcp: Fence early sysfs interfaces for accesses of
shost objects
d0dff2ac98dd4 scsi: zfcp: Move allocation of the shost object to after
xconf- and xport-data
I test this with the kernel you provide @ git://kernel.ubuntu.com/ubuntu
/ubuntu-focal.git, added Linus' tree as secondary remote (@
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git) and
ran:
git cherry-pick 92953c6e0aa7~1..e76acc519426
git cherry-pick a3fd4bfe85fb
git cherry-pick e05a10a05509~1..42cabdaf103b
git cherry-pick cec9cbac5244
git cherry-pick 978857c7e367~1..d0dff2ac98dd
That worked without a hitch on top of tag "Ubuntu-5.4.0-41.45"
I tested this then by building an Ubuntu distribution kernel (unsigned)
on level 5.4.0-41.45 with the patches from above applied. I ran our
regression-suite with DIV/DIX/NONE; with I/O, and local/remote cable
pulls, switched and p-t-p. Everything worked fine for me.
So I'm positive that this should work just fine.
If you don't want to pull all these commits it'll get complicated; I
gave it a look some time ago if there was a smaller changeset possible
without/with minimal changes, but found that we would have to
touch/change several patches to make them apply properly and not have
any regressions. So I would prefer this. It would also make future
stable backports easier.
** Affects: linux (Ubuntu)
Importance: Undecided
Assignee: Skipper Bug Screeners (skipper-screen-team)
Status: New
** Tags: architecture-s39064 bugnameltc-186041 severity-critical
targetmilestone-inin2004
--
[UBUNTU 20.04] kernel panic with zfcp.dif=1 and zfcp.dix=1 - crash in
scsi_queue_rq
https://bugs.launchpad.net/bugs/1887124
You received this bug notification because you are a member of Kernel Packages,
which is subscribed to linux in Ubuntu.
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp