This bug was fixed in the package linux - 5.4.0-128.144

---------------
linux (5.4.0-128.144) focal; urgency=medium

  * focal/linux: 5.4.0-128.144 -proposed tracker (LP: #1990152)

  * CVE-2022-3176
    - io_uring: disable polling pollfree files

  * ip/nexthop: fix default address selection for connected nexthop
    (LP: #1988809)
    - selftests/net: test nexthop without gw

  * ip/nexthop: fix default address selection for connected nexthop
    (LP: #1988809) // icmp_redirect.sh in ubuntu_kernel_selftests failed on
    Jammy 5.15.0-49.55 (LP: #1990124)
    - ip: fix triggering of 'icmp redirect'

linux (5.4.0-127.143) focal; urgency=medium

  * focal/linux: 5.4.0-127.143 -proposed tracker (LP: #1989892)

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2022.09.19)

  * [UBUNTU 20.04] mlx5 driver crashes on accessing device attributes during
    recovery (LP: #1987287)
    - net/mlx5: Avoid processing commands before cmdif is ready

  * Focal update: v5.4.210 upstream stable release (LP: #1989230)
    - thermal: Fix NULL pointer dereferences in of_thermal_ functions
    - ACPI: video: Force backlight native for some TongFang devices
    - ACPI: video: Shortening quirk list by identifying Clevo by board_name only
    - ACPI: APEI: Better fix to avoid spamming the console with old error logs
    - bpf: Verifer, adjust_scalar_min_max_vals to always call 
update_reg_bounds()
    - selftests/bpf: Extend verifier and bpf_sock tests for dst_port loads
    - bpf: Test_verifier, #70 error message updates for 32-bit right shift
    - KVM: Don't null dereference ops->destroy
    - selftests: KVM: Handle compiler optimizations in ucall
    - media: v4l2-mem2mem: Apply DST_QUEUE_OFF_BASE on MMAP buffers across 
ioctls
    - macintosh/adb: fix oob read in do_adb_query() function
    - x86/speculation: Add RSB VM Exit protections
    - x86/speculation: Add LFENCE to RSB fill sequence
    - Linux 5.4.210

  * Focal update: v5.4.209 upstream stable release (LP: #1989228)
    - Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
    - ntfs: fix use-after-free in ntfs_ucsncmp()
    - s390/archrandom: prevent CPACF trng invocations in interrupt context
    - tcp: Fix data-races around sysctl_tcp_dsack.
    - tcp: Fix a data-race around sysctl_tcp_app_win.
    - tcp: Fix a data-race around sysctl_tcp_adv_win_scale.
    - tcp: Fix a data-race around sysctl_tcp_frto.
    - tcp: Fix a data-race around sysctl_tcp_nometrics_save.
    - ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)
    - ice: do not setup vlan for loopback VSI
    - scsi: ufs: host: Hold reference returned by of_parse_phandle()
    - tcp: Fix a data-race around sysctl_tcp_limit_output_bytes.
    - tcp: Fix a data-race around sysctl_tcp_challenge_ack_limit.
    - net: ping6: Fix memleak in ipv6_renew_options().
    - ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr
    - igmp: Fix data-races around sysctl_igmp_qrv.
    - net: sungem_phy: Add of_node_put() for reference returned by 
of_get_parent()
    - tcp: Fix a data-race around sysctl_tcp_min_tso_segs.
    - tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen.
    - tcp: Fix a data-race around sysctl_tcp_autocorking.
    - tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit.
    - Documentation: fix sctp_wmem in ip-sysctl.rst
    - tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns.
    - tcp: Fix a data-race around sysctl_tcp_comp_sack_nr.
    - i40e: Fix interface init with MSI interrupts (no MSI-X)
    - sctp: fix sleep in atomic context bug in timer handlers
    - virtio-net: fix the race between refill work and close
    - perf symbol: Correct address for bss symbols
    - sfc: disable softirqs for ptp TX
    - sctp: leave the err path free in sctp_stream_init to sctp_stream_free
    - ARM: crypto: comment out gcc warning that breaks clang builds
    - mt7601u: add USB device ID for some versions of XiaoDu WiFi Dongle.
    - scsi: core: Fix race between handling STS_RESOURCE and completion
    - Linux 5.4.209

  * Focal update: v5.4.208 upstream stable release (LP: #1988225)
    - pinctrl: stm32: fix optional IRQ support to gpios
    - riscv: add as-options for modules with assembly compontents
    - mlxsw: spectrum_router: Fix IPv4 nexthop gateway indication
    - lockdown: Fix kexec lockdown bypass with ima policy
    - xen/gntdev: Ignore failure to unmap INVALID_GRANT_HANDLE
    - PCI: hv: Fix multi-MSI to allow more than one MSI vector
    - PCI: hv: Fix hv_arch_irq_unmask() for multi-MSI
    - PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()
    - PCI: hv: Fix interrupt mapping for multi-MSI
    - serial: mvebu-uart: correctly report configured baudrate value
    - xfrm: xfrm_policy: fix a possible double xfrm_pols_put() in
      xfrm_bundle_lookup()
    - power/reset: arm-versatile: Fix refcount leak in versatile_reboot_probe
    - pinctrl: ralink: Check for null return of devm_kcalloc
    - perf/core: Fix data race between perf_event_set_output() and
      perf_mmap_close()
    - igc: Reinstate IGC_REMOVED logic and implement it properly
    - ip: Fix data-races around sysctl_ip_no_pmtu_disc.
    - ip: Fix data-races around sysctl_ip_fwd_use_pmtu.
    - ip: Fix data-races around sysctl_ip_nonlocal_bind.
    - ip: Fix a data-race around sysctl_fwmark_reflect.
    - tcp/dccp: Fix a data-race around sysctl_tcp_fwmark_accept.
    - tcp: Fix data-races around sysctl_tcp_mtu_probing.
    - tcp: Fix data-races around sysctl_tcp_base_mss.
    - tcp: Fix data-races around sysctl_tcp_min_snd_mss.
    - tcp: Fix a data-race around sysctl_tcp_mtu_probe_floor.
    - tcp: Fix a data-race around sysctl_tcp_probe_threshold.
    - tcp: Fix a data-race around sysctl_tcp_probe_interval.
    - i2c: cadence: Change large transfer count reset logic to be unconditional
    - net: stmmac: fix dma queue left shift overflow issue
    - net/tls: Fix race in TLS device down flow
    - igmp: Fix data-races around sysctl_igmp_llm_reports.
    - igmp: Fix a data-race around sysctl_igmp_max_memberships.
    - tcp: Fix data-races around sysctl_tcp_syncookies.
    - tcp: Fix data-races around sysctl_tcp_reordering.
    - tcp: Fix data-races around some timeout sysctl knobs.
    - tcp: Fix a data-race around sysctl_tcp_notsent_lowat.
    - tcp: Fix a data-race around sysctl_tcp_tw_reuse.
    - tcp: Fix data-races around sysctl_max_syn_backlog.
    - tcp: Fix data-races around sysctl_tcp_fastopen.
    - iavf: Fix handling of dummy receive descriptors
    - i40e: Fix erroneous adapter reinitialization during recovery process
    - ixgbe: Add locking to prevent panic when setting sriov_numvfs to zero
    - gpio: pca953x: only use single read/write for No AI mode
    - be2net: Fix buffer overflow in be_get_module_eeprom
    - ipv4: Fix a data-race around sysctl_fib_multipath_use_neigh.
    - udp: Fix a data-race around sysctl_udp_l3mdev_accept.
    - tcp: Fix data-races around sysctl knobs related to SYN option.
    - tcp: Fix a data-race around sysctl_tcp_early_retrans.
    - tcp: Fix data-races around sysctl_tcp_recovery.
    - tcp: Fix a data-race around sysctl_tcp_thin_linear_timeouts.
    - tcp: Fix data-races around sysctl_tcp_slow_start_after_idle.
    - tcp: Fix a data-race around sysctl_tcp_retrans_collapse.
    - tcp: Fix a data-race around sysctl_tcp_stdurg.
    - tcp: Fix a data-race around sysctl_tcp_rfc1337.
    - tcp: Fix data-races around sysctl_tcp_max_reordering.
    - spi: bcm2835: bcm2835_spi_handle_err(): fix NULL pointer deref for non DMA
      transfers
    - mm/mempolicy: fix uninit-value in mpol_rebind_policy()
    - bpf: Make sure mac_header was set before using it
    - dlm: fix pending remove if msg allocation fails
    - ima: remove the IMA_TEMPLATE Kconfig option
    - [Config] updateconfigs for IMA_TEMPLATE
    - locking/refcount: Define constants for saturation and max refcount values
    - locking/refcount: Ensure integer operands are treated as signed
    - locking/refcount: Remove unused refcount_*_checked() variants
    - locking/refcount: Move the bulk of the REFCOUNT_FULL implementation into 
the
      <linux/refcount.h> header
    - locking/refcount: Improve performance of generic REFCOUNT_FULL code
    - locking/refcount: Move saturation warnings out of line
    - locking/refcount: Consolidate REFCOUNT_{MAX,SATURATED} definitions
    - locking/refcount: Consolidate implementations of refcount_t
    - [Config] updateconfigs for REFCOUNT_FULL
    - x86: get rid of small constant size cases in raw_copy_{to,from}_user()
    - x86/uaccess: Implement macros for CMPXCHG on user addresses
    - mmap locking API: initial implementation as rwsem wrappers
    - x86/mce: Deduplicate exception handling
    - bitfield.h: Fix "type of reg too small for mask" test
    - ALSA: memalloc: Align buffer allocations in page size
    - Bluetooth: Add bt_skb_sendmsg helper
    - Bluetooth: Add bt_skb_sendmmsg helper
    - Bluetooth: SCO: Replace use of memcpy_from_msg with bt_skb_sendmsg
    - Bluetooth: RFCOMM: Replace use of memcpy_from_msg with bt_skb_sendmmsg
    - Bluetooth: Fix passing NULL to PTR_ERR
    - Bluetooth: SCO: Fix sco_send_frame returning skb->len
    - Bluetooth: Fix bt_skb_sendmmsg not allocating partial chunks
    - tty: drivers/tty/, stop using tty_schedule_flip()
    - tty: the rest, stop using tty_schedule_flip()
    - tty: drop tty_schedule_flip()
    - tty: extract tty_flip_buffer_commit() from tty_flip_buffer_push()
    - tty: use new tty_insert_flip_string_and_push_buffer() in pty_write()
    - x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm()
    - Linux 5.4.208

  * Focal update: v5.4.207 upstream stable release (LP: #1988219)
    - ALSA: hda - Add fixup for Dell Latitidue E5430
    - ALSA: hda/conexant: Apply quirk for another HP ProDesk 600 G3 model
    - ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc671
    - ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc221
    - ALSA: hda/realtek - Enable the headset-mic on a Xiaomi's laptop
    - xen/netback: avoid entering xenvif_rx_next_skb() with an empty rx queue
    - tracing/histograms: Fix memory leak problem
    - net: sock: tracing: Fix sock_exceed_buf_limit not to dereference stale
      pointer
    - ip: fix dflt addr selection for connected nexthop
    - ARM: 9213/1: Print message about disabled Spectre workarounds only once
    - ARM: 9214/1: alignment: advance IT state after emulating Thumb instruction
    - wifi: mac80211: fix queue selection for mesh/OCB interfaces
    - cgroup: Use separate src/dst nodes when preloading css_sets for migration
    - drm/panfrost: Fix shrinker list corruption by madvise IOCTL
    - nilfs2: fix incorrect masking of permission flags for symlinks
    - Revert "evm: Fix memleak in init_desc"
    - sched/rt: Disable RT_RUNTIME_SHARE by default
    - ext4: fix race condition between ext4_write and ext4_convert_inline_data
    - ARM: dts: imx6qdl-ts7970: Fix ngpio typo and count
    - ARM: 9209/1: Spectre-BHB: avoid pr_info() every time a CPU comes out of 
idle
    - ARM: 9210/1: Mark the FDT_FIXED sections as shareable
    - drm/i915: fix a possible refcount leak in intel_dp_add_mst_connector()
    - ima: Fix a potential integer overflow in ima_appraise_measurement
    - ASoC: sgtl5000: Fix noise on shutdown/remove
    - net: stmmac: dwc-qos: Disable split header for Tegra194
    - inetpeer: Fix data-races around sysctl.
    - net: Fix data-races around sysctl_mem.
    - cipso: Fix data-races around sysctl.
    - icmp: Fix data-races around sysctl.
    - ipv4: Fix a data-race around sysctl_fib_sync_mem.
    - ARM: dts: at91: sama5d2: Fix typo in i2s1 node
    - ARM: dts: sunxi: Fix SPI NOR campatible on Orange Pi Zero
    - drm/i915/gt: Serialize TLB invalidates with GT resets
    - icmp: Fix a data-race around sysctl_icmp_ratelimit.
    - icmp: Fix a data-race around sysctl_icmp_ratemask.
    - raw: Fix a data-race around sysctl_raw_l3mdev_accept.
    - ipv4: Fix data-races around sysctl_ip_dynaddr.
    - net: ftgmac100: Hold reference returned by of_get_child_by_name()
    - sfc: fix use after free when disabling sriov
    - seg6: fix skb checksum evaluation in SRH encapsulation/insertion
    - seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors
    - seg6: bpf: fix skb checksum in bpf_push_seg6_encap()
    - sfc: fix kernel panic when creating VF
    - mm: sysctl: fix missing numa_stat when !CONFIG_HUGETLB_PAGE
    - virtio_mmio: Add missing PM calls to freeze/restore
    - virtio_mmio: Restore guest page size on resume
    - netfilter: br_netfilter: do not skip all hooks with 0 priority
    - cpufreq: pmac32-cpufreq: Fix refcount leak bug
    - platform/x86: hp-wmi: Ignore Sanitization Mode event
    - net: tipc: fix possible refcount leak in tipc_sk_create()
    - NFC: nxp-nci: don't print header length mismatch on i2c error
    - nvme: fix regression when disconnect a recovering ctrl
    - net: sfp: fix memory leak in sfp_probe()
    - ASoC: ops: Fix off by one in range control validation
    - ASoC: wm5110: Fix DRE control
    - ASoC: cs47l15: Fix event generation for low power mux control
    - ASoC: madera: Fix event generation for OUT1 demux
    - ASoC: madera: Fix event generation for rate controls
    - irqchip: or1k-pic: Undefine mask_ack for level triggered hardware
    - x86: Clear .brk area at early boot
    - soc: ixp4xx/npe: Fix unused match warning
    - ARM: dts: stm32: use the correct clock source for CEC on stm32mp151
    - signal handling: don't use BUG_ON() for debugging
    - USB: serial: ftdi_sio: add Belimo device ids
    - usb: typec: add missing uevent when partner support PD
    - usb: dwc3: gadget: Fix event pending check
    - tty: serial: samsung_tty: set dma burst_size to 1
    - serial: 8250: fix return error code in serial8250_request_std_resource()
    - serial: stm32: Clear prev values before setting RTS delays
    - serial: pl011: UPSTAT_AUTORTS requires .throttle/unthrottle
    - can: m_can: m_can_tx_handler(): fix use after free of skb
    - Linux 5.4.207

  * Focal update: v5.4.206 upstream stable release (LP: #1988215)
    - Linux 5.4.206

  * Focal update: v5.4.205 upstream stable release (LP: #1988214)
    - esp: limit skb_page_frag_refill use to a single page
    - mm/slub: add missing TID updates on slab deactivation
    - can: bcm: use call_rcu() instead of costly synchronize_rcu()
    - can: grcan: grcan_probe(): remove extra of_node_get()
    - can: gs_usb: gs_usb_open/close(): fix memory leak
    - usbnet: fix memory leak in error case
    - net: rose: fix UAF bug caused by rose_t0timer_expiry
    - iommu/vt-d: Fix PCI bus rescan device hot add
    - fbdev: fbmem: Fix logo center image dx issue
    - video: of_display_timing.h: include errno.h
    - powerpc/powernv: delay rng platform device creation until later in boot
    - can: kvaser_usb: replace run-time checks with struct 
kvaser_usb_driver_info
    - can: kvaser_usb: kvaser_usb_leaf: fix CAN clock frequency regression
    - can: kvaser_usb: kvaser_usb_leaf: fix bittiming limits
    - xfs: remove incorrect ASSERT in xfs_rename
    - ARM: meson: Fix refcount leak in meson_smp_prepare_cpus
    - pinctrl: sunxi: a83t: Fix NAND function name for some pins
    - pinctrl: sunxi: sunxi_pconf_set: use correct offset
    - ARM: at91: pm: use proper compatible for sama5d2's rtc
    - ARM: at91: pm: use proper compatibles for sam9x60's rtc and rtt
    - ibmvnic: Properly dispose of all skbs during a failover.
    - selftests: forwarding: fix flood_unicast_test when h2 supports
      IFF_UNICAST_FLT
    - selftests: forwarding: fix learning_test when h1 supports IFF_UNICAST_FLT
    - selftests: forwarding: fix error message in learning_test
    - i2c: cadence: Unregister the clk notifier in error path
    - dmaengine: imx-sdma: Allow imx8m for imx7 FW revs
    - misc: rtsx_usb: fix use of dma mapped buffer for usb bulk transfer
    - misc: rtsx_usb: use separate command and response buffers
    - misc: rtsx_usb: set return value in rsp_buf alloc err path
    - dt-bindings: dma: allwinner,sun50i-a64-dma: Fix min/max typo
    - ida: don't use BUG_ON() for debugging
    - dmaengine: pl330: Fix lockdep warning about non-static key
    - dmaengine: at_xdma: handle errors of at_xdmac_alloc_desc() correctly
    - dmaengine: ti: Fix refcount leak in ti_dra7_xbar_route_allocate
    - dmaengine: ti: Add missing put_device in ti_dra7_xbar_route_allocate
    - Linux 5.4.205

  * Focal update: v5.4.204 upstream stable release (LP: #1988212)
    - ipv6: take care of disable_policy when restoring routes
    - nvdimm: Fix badblocks clear off-by-one error
    - powerpc/prom_init: Fix kernel config grep
    - powerpc/bpf: Fix use of user_pt_regs in uapi
    - dm raid: fix accesses beyond end of raid member array
    - dm raid: fix KASAN warning in raid5_add_disks
    - s390/archrandom: simplify back to earlier design and initialize earlier
    - SUNRPC: Fix READ_PLUS crasher
    - net: rose: fix UAF bugs caused by timer handler
    - net: usb: ax88179_178a: Fix packet receiving
    - virtio-net: fix race between ndo_open() and virtio_device_ready()
    - selftests/net: pass ipv6_args to udpgso_bench's IPv6 TCP test
    - net: tun: unlink NAPI from device on destruction
    - net: tun: stop NAPI when detaching queues
    - RDMA/qedr: Fix reporting QP timeout attribute
    - linux/dim: Fix divide by 0 in RDMA DIM
    - usbnet: fix memory allocation in helpers
    - net: ipv6: unexport __init-annotated seg6_hmac_net_init()
    - caif_virtio: fix race between virtio_device_ready() and ndo_open()
    - PM / devfreq: exynos-ppmu: Fix refcount leak in of_get_devfreq_events
    - s390: remove unneeded 'select BUILD_BIN2C'
    - netfilter: nft_dynset: restore set element counter when failing to update
    - net/sched: act_api: Notify user space if any actions were flushed before
      error
    - net: bonding: fix possible NULL deref in rlb code
    - net: bonding: fix use-after-free after 802.3ad slave unbind
    - nfc: nfcmrvl: Fix irq_of_parse_and_map() return value
    - NFC: nxp-nci: Don't issue a zero length i2c_master_read()
    - net: tun: avoid disabling NAPI twice
    - xen/gntdev: Avoid blocking in unmap_grant_pages()
    - hwmon: (ibmaem) don't call platform_device_del() if platform_device_add()
      fails
    - net: dsa: bcm_sf2: force pause link settings
    - sit: use min
    - ipv6/sit: fix ipip6_tunnel_get_prl return value
    - rseq/selftests,x86_64: Add rseq_offset_deref_addv()
    - selftests/rseq: remove ARRAY_SIZE define from individual tests
    - selftests/rseq: introduce own copy of rseq uapi header
    - selftests/rseq: Remove useless assignment to cpu variable
    - selftests/rseq: Remove volatile from __rseq_abi
    - selftests/rseq: Introduce rseq_get_abi() helper
    - selftests/rseq: Introduce thread pointer getters
    - selftests/rseq: Uplift rseq selftests for compatibility with glibc-2.35
    - selftests/rseq: Fix ppc32: wrong rseq_cs 32-bit field pointer on big 
endian
    - selftests/rseq: Fix ppc32 missing instruction selection "u" and "x" for
      load/store
    - selftests/rseq: Fix ppc32 offsets by using long rather than off_t
    - selftests/rseq: Fix warnings about #if checks of undefined tokens
    - selftests/rseq: Remove arm/mips asm goto compiler work-around
    - selftests/rseq: Fix: work-around asm goto compiler bugs
    - selftests/rseq: x86-64: use %fs segment selector for accessing rseq thread
      area
    - selftests/rseq: x86-32: use %gs segment selector for accessing rseq thread
      area
    - selftests/rseq: Change type of rseq_offset to ptrdiff_t
    - xen/blkfront: fix leaking data in shared pages
    - xen/netfront: fix leaking data in shared pages
    - xen/netfront: force data bouncing when backend is untrusted
    - xen/blkfront: force data bouncing when backend is untrusted
    - xen/arm: Fix race in RB-tree based P2M accounting
    - net: usb: qmi_wwan: add Telit 0x1060 composition
    - net: usb: qmi_wwan: add Telit 0x1070 composition
    - clocksource/drivers/ixp4xx: remove EXPORT_SYMBOL_GPL from
      ixp4xx_timer_setup()
    - Linux 5.4.204

 -- Stefan Bader <stefan.ba...@canonical.com>  Tue, 20 Sep 2022 11:19:18
+0200

** Changed in: linux (Ubuntu Focal)
       Status: Fix Committed => Fix Released

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2022-3176

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1987287

Title:
  [UBUNTU 20.04] mlx5 driver crashes on accessing device attributes
  during recovery

Status in Ubuntu on IBM z Systems:
  Fix Committed
Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Focal:
  Fix Released

Bug description:
  SRU Justification:
  ------------------

  [Impact]

   * If the mlx5 driver is reloading while the recovery flow is happening,
     and if it receives new commands before the command interface is up
     again, this can lead to null pointer that tries to access non-
     initialized command structures.

   * So it's required to avoid processing commands before the command
     interface is up again.

   * This is accomplished by a new cmdif state that helps to avoid
     processing commands while cmdif is not ready.

  [Fix]

   * backport of f7936ddd35d8 f7936ddd35d8b849daf0372770c7c9dbe7910fca
  "net/mlx5: Avoid processing commands before cmdif is ready"

  [Test Plan]

   * An Ubuntu Server for s390x 18.04 or 20.04 LPAR or z/VM installation
     is needed that has Mellanox cards (RoCE Express 2.1) assigned,
     configured and enabled and that runs a 5.4 kernel (on bionic hwe-5.4).

   * Now trigger a recovery (guess that can be done at the Support Element)
     and reload the driver at the same time.

   * Make sure the module/driver mlx5 is loaded and in use
     (otherwise it can't be removed/unloaded).

   * Now remove/unload the module with:
     sudo modprobe -r mlx5
     and (re-)load it again with:
     sudo modprobe mlx5

   * Due to the lack of RoCE Express 2.1 hardware,
     IBM needs to do the verification.

  [Where problems could occur]

   * In case there is an issue with 'cmdif' it might not have the correct
     interface state, which:
     - either might lead to the fact that commands are not properly blocked
       and the situation is similar like before
     - or the commands may get always blocked,
       which render the hardware useless
     - or might block in wrong situation,
       which will cause unexpected issues and broken behavior.

   * Since the patch got upstream accepted with v5.7-rc7 it's
     not new to the kernel, was already part of groovy (and above)
     and is therefor already in use by newer Ubuntu releases.

  [Other Info]

   * Since the patch is upstream since v5.7-rc7,
     it's already included in jammy and kinetic.

   * Since the upstream patch incl. the line:
     Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox
     Connect-IB adapters") it looks to me that it was forgotten
     to mark the patch for upstream stable updates.

   * Such SRUs for focal's 5.4 will automatically land in bionic's
     hwe-5.4, too. But since this was especially requested for
     bionic's hwe-5.4, I wanted to mention this here.
  __________

  We recently got a bug report for systems running Ubuntu 20.04 that were
  crashing with backtraces pointing at the mlx5 driver's handling of 
mlx5_ethtool_get_link_ksettings()
  when this is called through the sysfs (going through ethtool might have 
different checks).
  I managed to find a reliable way to reproduce the issue that I believe isn't 
tied to IBM Z at all.

  The procedure to reproduce is as follows. I created a script to read
  the sysfs attributes for the link's speed and duplex mode in a loop:

  #!/usr/bin/env bash

  if [ $# -lt 1 ]; then
          echo "Usage: $0 <netif>"
          exit 1
  fi

  while true; do
          cat /sys/class/net/$1/duplex > /dev/null
          cat /sys/class/net/$1/speed > /dev/null
  done

  Executed with:

  # ./script.sh enP10p0s0

  I ran this in one bash session and then in another one I triggered a PCI 
reset with
  the follwoing command where one needs to replace <dev> with the PCI address 
of the NIC:

  echo 1 > /sys/bus/pci/devices/<dev>/reset

  Then first I got a lot of the following messages:

   mlx5_core 0010:00:00.0 enP16p0s0: mlx5e_ethtool_get_link_ksettings:
  query port ptys failed: -5

  And then as the mlx5 driver's recovery kicks in the oops as below:

  [  659.103947] mlx5_core 0010:00:00.0: wait vital counter value 0x7b399f 
after 1 iterations
  [  659.103947] mlx5_core 0010:00:00.0: mlx5_pci_resume was called
  [  659.103966] mlx5_core 0010:00:00.0: firmware version: 14.32.1010
  [  659.104169] Unable to handle kernel pointer dereference in virtual kernel 
address space
  [  659.104171] Failing address: 0000000000000000 TEID: 0000000000000483
  [  659.104172] Fault in home space mode while using kernel ASCE.
  [  659.104173] AS:000000003d29c007 R3:00000000fffd0007 S:00000000fffd5800 
P:000000000000003d
  [  659.104200] Oops: 0004 ilc:2 [#1] SMP
  [  659.104202] Modules linked in: s390_trng ism smc pnet chsc_sch eadm_sch 
vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio sch_fq_codel drm 
drm_panel_orientation_quirks i2c_core ip_tables x_tables btrfs zstd_compress 
zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor 
async_tx xor raid6_pq libcrc32c raid1 raid0 linear mlx5_ib dm_service_time pkey 
zcrypt crc32_vx_s390 ib_uverbs ghash_s390 ib_core qeth_l2 prng aes_s390 
des_s390 nvme libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 
sha1_s390 sha_common mlx5_core tls mlxfw ptp nvme_core pps_core dasd_eckd_mod 
dasd_mod zfcp scsi_transport_fc qeth qdio ccwgroup scsi_dh_emc scsi_dh_rdac 
scsi_dh_alua dm_multipath
  [  659.104232] CPU: 6 PID: 438216 Comm: cat Not tainted 5.4.0-124-generic 
#140-Ubuntu
  [  659.104233] Hardware name: IBM 3931 XYZ XXXX (LPAR)
  [  659.104234] Krnl PSW : 0404c00180000000 000000003bfa661e 
(__queue_work+0xfe/0x520)
  [  659.104241]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 
RI:0 EA:3
  [  659.104242] Krnl GPRS: 000000003c291570 0000000000000000 0000000000000007 
000000007fffffff
  [  659.104243]            00000000e2fe46e0 0000000fffffffe0 0000000000000006 
000000003d039588
  [  659.104244]            0000000000000000 0000000000000000 00000000e2fe46e0 
00000000bfb3e000
  [  659.104245]            00000000e194c400 000003e007d6fb78 000000003bfa6602 
000003e007d6f860
  [  659.104251] Krnl Code: 000000003bfa6612: a77400e5            brc     
7,000000003bfa67dc
                            000000003bfa6616: 582003ac            l       
%r2,940
                           #000000003bfa661a: a7180000            lhi     %r1,0
                           >000000003bfa661e: ba129000            cs      
%r1,%r2,0(%r9)
                            000000003bfa6622: a77401a7            brc     
7,000000003bfa6970
                            000000003bfa6626: e310b0180012        lt      
%r1,24(%r11)
                            000000003bfa662c: a78400ff            brc     
8,000000003bfa682a
                            000000003bfa6630: c0040000004b        brcl    
0,000000003bfa66c6
  [  659.104261] Call Trace:
  [  659.104263] ([<0000000000000000>] 0x0)
  [  659.104265]  [<000000003bfa6aa2>] queue_work_on+0x62/0x70
  [  659.104329]  [<000003ff80a2920a>] cmd_exec+0x4ea/0x840 [mlx5_core]
  [  659.104349]  [<000003ff80a29680>] mlx5_cmd_exec+0x40/0x70 [mlx5_core]
  [  659.104369]  [<000003ff80a334a8>] mlx5_core_access_reg+0x108/0x150 
[mlx5_core]
  [  659.104387]  [<000003ff80a3354e>] mlx5_query_port_ptys+0x5e/0x70 
[mlx5_core]
  [  659.104407]  [<000003ff80a5b928>] 
mlx5e_ethtool_get_link_ksettings+0x58/0x460 [mlx5_core]
  [  659.104410]  [<000000003c662a68>] duplex_show+0x78/0xe0
  [  659.104414]  [<000000003c538ecc>] dev_attr_show+0x2c/0x70
  [  659.104417]  [<000000003c293386>] sysfs_kf_seq_show+0xa6/0x150
  [  659.104420]  [<000000003c207470>] seq_read+0xe0/0x4f0
  [  659.104422]  [<000000003c1d4cf4>] vfs_read+0x94/0x160
  [  659.104423]  [<000000003c1d4ea8>] ksys_read+0x68/0x100
  [  659.104426]  [<000000003c7f2034>] system_call+0xd8/0x2c8
  [  659.104427] Last Breaking-Event-Address:
  [  659.104428]  [<000000003bfa65d6>] __queue_work+0xb6/0x520
  [  659.104430] ---[ end trace 9fc1a6358b456876 ]---

  Digging into the code and git history I found the following upstream commit 
added in v5.7
  which besides being part of the 5.6.x stable patches somehow didn't make it 
into
  the 5.4.x stable queue nor Ubuntu 20.04, possibly because there is a 
(trivial) merge conflict:

  commit f7936ddd35d8b849daf0372770c7c9dbe7910fca
  Author: Eran Ben Elisha <era...@mellanox.com>
  Date:   Thu Mar 19 21:43:13 2020 +0200

      net/mlx5: Avoid processing commands before cmdif is ready

      When driver is reloading during recovery flow, it can't get new commands
      till command interface is up again. Otherwise we may get to null pointer
      trying to access non initialized command structures.

      Add cmdif state to avoid processing commands while cmdif is not
  ready.

      Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: Eran Ben Elisha <era...@mellanox.com>
      Signed-off-by: Moshe Shemesh <mo...@mellanox.com>
      Signed-off-by: Saeed Mahameed <sae...@mellanox.com>

  With a quick backport onto 5.4.0-124.140 (patch attached below) the issue is 
gone
  and the system no longer crashes but instead recovers successfully. I believe 
the
  crash we saw is then exactly the null pointer access mentioned in the commit 
description.

  == Comment: #4 - Niklas Schnelle - 2022-08-22 06:41:41 ==
  There was only a trivial merge conflict where the context
  of the struct decleration changed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1987287/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to