You have been subscribed to a public bug:
Description: zfcp: fix double free of FSF request when qdio send fails
Symptom: When doing maintenance actions on FCP devices that turn off a
FCP device while I/O is still running on it in Linux - for
example turning off the channel path of the FCP device - the
Linux kernel crashes.
Problem: We used to use the wrong type of integer in
'zfcp_fsf_req_send()' to cache the FSF request ID when sending a
new FSF request. This is used in case the sending fails and we
need to remove the request from our internal hash table again
(so we don't keep an invalid reference and use it when we free
the request again).
In 'zfcp_fsf_req_send()' we used to cache the ID as 'int'
(signed and 32 bit wide), but the rest of the zfcp code (and the
firmware specification) handles the ID as 'unsigned long'/'u64'
(unsigned and 64 bit wide [s390x ELF ABI]).
For one this has the obvious problem that when the ID grows
past 32 bit (this can happen reasonably fast) it is truncated to
32 bit when storing it in the cache variable and so doesn't
match the original ID anymore.
The second less obvious problem is that even when the
original ID has not yet grown past 32 bit, as soon as the 32nd
bit is set in the original ID (0x80000000 = 2'147'483'648) we
will have a mismatch when we cast it back to 'unsigned long'
because casting the signed type 'int' into the wider type
'unsigned long' will use a sign-extending instruction, and so
flip all leading zeros to one instead.
If we can't successfully remove the request from the hash table
again after 'zfcp_qdio_send()' fails (this happens regularly
when zfcp cannot notify the adapter about new work because the
adapter is already gone during e.g. a ChpID toggle) we will end
up with a double free.
We unconditionally free the request in the calling function
when 'zfcp_fsf_req_send()' fails, but because the request is
still in the hash table we end up with a stale memory reference,
and once the zfcp adapter is either reset during recovery or
shutdown we end up freeing the same memory twice.
Solution: To fix this, simply change the type of the cache variable to
'unsigned long', like the rest of zfcp and also the argument for
'zfcp_reqlist_find_rm()'. This prevents truncation and wrong
sign extension and so can successfully remove the request from
the hash table.
Reproduction: Run I/O on a FCP device for so long that you have sent
2'147'483'648 requests. The current request number can not be
read directly from user space, but can be read indirectly by
using 'zfcp_ping' and 'zfcpdbf' (use the correct device-bus-ID):
sudo sh -c 'zfcp_ping -a "${0}" 0xFFFFFFFFFFFFFFFF \
2>/dev/null 1>&2; zfcpdbf "${0}" -x all -i SAN 2>/dev/null \
| grep -E -e "^(Timestamp|Request ID)[[:blank:]]+:" | tail \
-n2' 0.0.1700
After having reached 0x80000000 requests, stop all I/O on the
FCP device and start only a single process doing single-threaded
synchronous, direct I/O on the FCP device (always only one
outstanding I/O operation).
While this I/O process is running, turn of the channel path
(ChpID) that is used for the FCP device/subchannel. This will
not always trigger the bug, but occasionally it will.
Proof that it hit the correct code-path in zfcp can be found
by using 'zfcpdbf' again (use the correct device-bus-ID):
zfcpdbf 0.0.1700 -x all -i REC 2>/dev/null | grep
'fsrs__1'
In case you hit the correct code-path this will print some lines
starting with 'Tag'.
Upstream-ID: 0954256e970ecf371b03a6c9af2cf91b9c4085ff
Preventive: yes
Author: Benjamin Block <[email protected]>
Component: kernel
Link:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0954256e970e
** Affects: linux (Ubuntu)
Importance: Undecided
Assignee: Skipper Bug Screeners (skipper-screen-team)
Status: New
** Tags: architecture-s39064 bugnameltc-200970 severity-high
targetmilestone-inin---
--
[UBUNTU 20.04] zfcp: fix double free of FSF request when qdio send fails
https://bugs.launchpad.net/bugs/2002984
You received this bug notification because you are a member of Kernel Packages,
which is subscribed to linux in Ubuntu.
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp