[Kernel-packages] [Bug 2071655] Re: PR for: "IB/mlx5: Use __iowrite64_copy() for write combining stores"

Ubuntu Kernel Bot Tue, 30 Jul 2024 06:43:15 -0700

This bug is awaiting verification that the linux-nvidia/6.8.0-1011.11
kernel in -proposed solves the problem. Please test the kernel and
update this bug with the results. If the problem is solved, change the
tag 'verification-needed-noble-linux-nvidia' to 'verification-done-
noble-linux-nvidia'. If the problem still exists, change the tag
'verification-needed-noble-linux-nvidia' to 'verification-failed-noble-
linux-nvidia'.



If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-noble-linux-nvidia-v2 
verification-needed-noble-linux-nvidia

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia in Ubuntu.
https://bugs.launchpad.net/bugs/2071655

Title:
  PR for:  "IB/mlx5: Use __iowrite64_copy() for write combining stores"

Status in linux-nvidia package in Ubuntu:
  New

Bug description:
      mlx5 has a built in self-test at driver startup to evaluate if the
      platform supports write combining to generate a 64 byte PCIe TLP or
      not. This has proven necessary because a lot of common scenarios end up
      with broken write combining (especially inside virtual machines) and there
      is other way to learn this information.

      This self test has been consistently failing on new ARM64 CPU
      designs (specifically with NVIDIA Grace's implementation of Neoverse
      V2). The C loop around writeq() generates some pretty terrible ARM64
      assembly, but historically this has worked on a lot of existing ARM64 CPUs
      till now.

      We see it succeed about 1 time in 10,000 on the worst effected
      systems. The CPU architects speculate that the load instructions
      interspersed with the stores makes the WC buffers statistically flush too
      often and thus the generation of large TLPs becomes infrequent. This makes
      the boot up test unreliable in that it indicates no write-combining,
      however userspace would be fine since it uses a ST4 instruction.

      Further, S390 has similar issues where only the special zpci_memcpy_toio()
      will actually generate large TLPs, and the open coded loop does not
      trigger it at all.

      Fix both ARM64 and S390 by switching to __iowrite64_copy() which now
      provides architecture specific variants that have a high change of
      generating a large TLP with write combining. x86 continues to use a
      similar writeq loop in the generate __iowrite64_copy().

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia/+bug/2071655/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 2071655] Re: PR for: "IB/mlx5: Use __iowrite64_copy() for write combining stores"

Reply via email to