When posted interrupts are in use, KVM fully bypasses the eventfd and
delivers events directly to the appropriate vCPU. Without posted
interrupts, it still uses the eventfd but it doesn't actually stop
userspace from receiving the events too. This leaves userspace having
to carefully avoid seeing the same events and injecting duplicate
interrupts to the guest.

Fix it by adding a 'priority' mode for exclusive waiters which puts them 
at the head of the list, where they can consume events before the 
non-exclusive waiters are woken.

v2: 
 • Drop [RFC]. This seems to be working nicely, and userspace is a lot
   cleaner without having to mess around with adding/removing the eventfd
   to its poll set. And nobody yelled at me. Yet.
 • Reword commit comments, update comment above __wake_up_common()
 • Rebase to be applied after the (only vaguely related) fix to make
   irqfd actually consume the eventfd counter too.

David Woodhouse (2):
      sched/wait: Add add_wait_queue_priority()
      kvm/eventfd: Use priority waitqueue to catch events before userspace

 include/linux/wait.h | 12 +++++++++++-
 kernel/sched/wait.c  | 17 ++++++++++++++++-
 virt/kvm/eventfd.c   |  6 ++++--



Reply via email to