Public bug reported:

[Impact]
    
    To save the vgic LPI pending state with GICv4.1, the VPEs must all be 
unmapped from the ITSs so that the sGIC caches can be flushed. The opposite is 
done once the state is saved.

    This is all done by using the activate/deactivate irqdomain
callbacks directly from the vgic code. Crutially, this is done without
holding the irqdesc lock for the interrupts that represent the VPE. And
these callbacks are changing the state of the irqdesc. What could
possibly go wrong?

    If a doorbell fires while we are messing with the irqdesc state, it
will acquire the lock and change the interrupt state concurrently. Since
we don't hole the lock, curruption occurs in on the interrupt state. Oh
well.

    While acquiring the lock would fix this (and this was Shanker's
initial approach), this is still a layering violation we could do
without. A better approach is actually to free the VPE interrupt, do
what we have to do, and re-request it.

    It is more work, but this usually happens only once in the lifetime
of the VM and we don't really care about this sort of overhead.

    The upstream maintainer acknowledged the bug, fixed the issue. and
it will be available in v6.2.

[Fixes]
    - single patch to address the race condition on VPE activation/deactivation

** Affects: linux-nvidia-5.19 (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-5.19 in Ubuntu.
https://bugs.launchpad.net/bugs/2003640

Title:
  Integrate NVIDIA Grace kernel fixes for vGIC

Status in linux-nvidia-5.19 package in Ubuntu:
  New

Bug description:
  [Impact]
      
      To save the vgic LPI pending state with GICv4.1, the VPEs must all be 
unmapped from the ITSs so that the sGIC caches can be flushed. The opposite is 
done once the state is saved.

      This is all done by using the activate/deactivate irqdomain
  callbacks directly from the vgic code. Crutially, this is done without
  holding the irqdesc lock for the interrupts that represent the VPE.
  And these callbacks are changing the state of the irqdesc. What could
  possibly go wrong?

      If a doorbell fires while we are messing with the irqdesc state,
  it will acquire the lock and change the interrupt state concurrently.
  Since we don't hole the lock, curruption occurs in on the interrupt
  state. Oh well.

      While acquiring the lock would fix this (and this was Shanker's
  initial approach), this is still a layering violation we could do
  without. A better approach is actually to free the VPE interrupt, do
  what we have to do, and re-request it.

      It is more work, but this usually happens only once in the
  lifetime of the VM and we don't really care about this sort of
  overhead.

      The upstream maintainer acknowledged the bug, fixed the issue. and
  it will be available in v6.2.

  [Fixes]
      - single patch to address the race condition on VPE 
activation/deactivation

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-5.19/+bug/2003640/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to