On Monday 11 October 2010 18:01:00 Michael S. Tsirkin wrote:
> On Mon, Oct 11, 2010 at 05:28:30PM +0800, Sheng Yang wrote:
> > On Sunday 03 October 2010 19:12:47 Michael S. Tsirkin wrote:
> > > On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
> > > > This patch enable per-vector mask for assigned devices using MSI-X.
> > > > 
> > > > Signed-off-by: Sheng Yang <sh...@linux.intel.com>
> > > 
> > > I think I see an issue here, noted below.  Some general comments:
> > > - The mask bit seems broken for injecting interrupts from
> > > 
> > >   userspace (with interrupts/ioctls).
> > >   I think we must test it on injection path.
> > 
> > I am not quite understand how it related to userspace interrupt injection
> > here... This patch only cover assigned devices for now.
> 
> Well, this is a kernel/userspace interface, if it's broken for userspace
> injection now we'll have to go through pain to fix it in a compatible
> way later when we want to use it for userspace injection.
> You might want to ask why we want the kernel to support making
> userspace-injected interrupts when userspace can just avoid injecting
> them, and the answer would be that with irqfd the injection might be
> handled in a separate process.

OK, I've understood how it related to userspace interrupt injection. But I 
still 
can't see why the interface is broken...
> 
> We currently handle this by destroying irqfd when irq is masked,
> an ioctl instead would be much faster.
> 
> > > - We'll need a way to support the pending bit.
> > > 
> > >   Disabling interrupts will not let us do it.
> > 
> > We may need a way to support pending bit, though I don't know which guest
> > has used it... And we can still know if there is interrupt pending by
> > check the real hardware's pending bit if it's necessary.
> 
> That's what I'm saying: since instead of masking the vector in hardware
> you disable irq in the APIC, the pending bit that we read from hardware
> will not have the correct value.

Are you sure? This disable_irq() has nothing to do with APIC. The disable 
callback 
in msi_chip didn't do anything but mark the IRQ status as IRQ_DISABLED, and the 
follow interrupt(if there are any) would be acked and masked, using mask 
callback 
in msi_chip. 

irq_to_desc() need to be exported for my initial version, in order to use mask 
callback. But later I think it would be clear and better if we use general IRQ 
function to do it. And I don't think the current solution would prevent us from 
reading hardware pending bits.

--
regards
Yang, Sheng

> 
> If we fix this, pending bit handling can be done by userspace.
> 
> > (And we haven't seen any problem by
> > leaving the bit 0 so far, and it's not in this patch's scope.)
> 
> I don't know about anyone using this, either, but the PCI spec does
> require support of polling mode where the pending bit is polled instead
> of interrupts. So yes, not a high priority to implement, but let's give
> the way we intend to support this in the future some thought.
> 
> > > > ---
> > > > 
> > > >  arch/x86/kvm/x86.c       |    1 +
> > > >  include/linux/kvm.h      |    9 ++++++++-
> > > >  include/linux/kvm_host.h |    1 +
> > > >  virt/kvm/assigned-dev.c  |   39
> > 
> > +++++++++++++++++++++++++++++++++++++++
> > 
> > > >  4 files changed, 49 insertions(+), 1 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index 8412c91..e6933e6 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -1927,6 +1927,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > > > 
> > > >         case KVM_CAP_DEBUGREGS:
> > > >         case KVM_CAP_X86_ROBUST_SINGLESTEP:
> > > > 
> > > >         case KVM_CAP_XSAVE:
> > > > +       case KVM_CAP_DEVICE_MSIX_MASK:
> > > >                 r = 1;
> > > >                 break;
> > > >         
> > > >         case KVM_CAP_COALESCED_MMIO:
> > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > > > index 919ae53..f2b7cdc 100644
> > > > --- a/include/linux/kvm.h
> > > > +++ b/include/linux/kvm.h
> > > > @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
> > > > 
> > > >  #endif
> > > >  #define KVM_CAP_PPC_GET_PVINFO 57
> > > >  #define KVM_CAP_PPC_IRQ_LEVEL 58
> > > > 
> > > > +#ifdef __KVM_HAVE_MSIX
> > > > +#define KVM_CAP_DEVICE_MSIX_MASK 59
> > > > +#endif
> > > > 
> > > >  #ifdef KVM_CAP_IRQ_ROUTING
> > > > 
> > > > @@ -787,11 +790,15 @@ struct kvm_assigned_msix_nr {
> > > > 
> > > >  };
> > > >  
> > > >  #define KVM_MAX_MSIX_PER_DEV           256
> > > > 
> > > > +
> > > > +#define KVM_MSIX_FLAG_MASK     1
> > > > +
> > > > 
> > > >  struct kvm_assigned_msix_entry {
> > > >  
> > > >         __u32 assigned_dev_id;
> > > >         __u32 gsi;
> > > >         __u16 entry; /* The index of entry in the MSI-X table */
> > > > 
> > > > -       __u16 padding[3];
> > > > +       __u16 flags;
> > > > +       __u16 padding[2];
> > > > 
> > > >  };
> > > >  
> > > >  #endif /* __LINUX_KVM_H */
> > > > 
> > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > index 0b89d00..a43405c 100644
> > > > --- a/include/linux/kvm_host.h
> > > > +++ b/include/linux/kvm_host.h
> > > > @@ -415,6 +415,7 @@ struct kvm_irq_ack_notifier {
> > > > 
> > > >  };
> > > >  
> > > >  #define KVM_ASSIGNED_MSIX_PENDING              0x1
> > > > 
> > > > +#define KVM_ASSIGNED_MSIX_MASK                 0x2
> > > > 
> > > >  struct kvm_guest_msix_entry {
> > > >  
> > > >         u32 vector;
> > > >         u16 entry;
> > > > 
> > > > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> > > > index 7c98928..15b8c32 100644
> > > > --- a/virt/kvm/assigned-dev.c
> > > > +++ b/virt/kvm/assigned-dev.c
> > > > @@ -17,6 +17,8 @@
> > > > 
> > > >  #include <linux/pci.h>
> > > >  #include <linux/interrupt.h>
> > > >  #include <linux/slab.h>
> > > > 
> > > > +#include <linux/irqnr.h>
> > > > +
> > > > 
> > > >  #include "irq.h"
> > > >  
> > > >  static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct
> > > >  list_head *head,
> > > > 
> > > > @@ -666,6 +668,30 @@ msix_nr_out:
> > > >         return r;
> > > >  
> > > >  }
> > > > 
> > > > +static void update_msix_mask(struct kvm_assigned_dev_kernel
> > > > *assigned_dev, +                             int index)
> > > > +{
> > > > +       int irq;
> > > > +
> > > > +       if (!assigned_dev->dev->msix_enabled ||
> > > > +           !(assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX))
> > > > +               return;
> > > > +
> > > > +       irq = assigned_dev->host_msix_entries[index].vector;
> > > > +
> > > > +       ASSERT(irq != 0);
> > > > +
> > > > +       if (assigned_dev->guest_msix_entries[index].flags &
> > > > +                       KVM_ASSIGNED_MSIX_MASK)
> > > > +               disable_irq(irq);
> > > > +       else {
> > > > +               enable_irq(irq);
> > > > +               if (assigned_dev->guest_msix_entries[index].flags &
> > > > +                               KVM_ASSIGNED_MSIX_PENDING)
> > > > +                       schedule_work(&assigned_dev->interrupt_work);
> > > > +       }
> > > > +}
> > > > +
> > > 
> > > What happens if guest masks an entry and then we hot-unplug the
> > > device and remove it from guest? It looks like interrupt
> > > will stay disabled?
> > 
> > I don't think so. pci_disable_msix() which was called in hot-unplug path
> > would recycle all IRQs used by the device. It should be the same as VM
> > shutdown.
> > 
> > Also before the IRQ was recycled, I believe the same dynamic IRQ wouldn't
> > be used by other devices.
> > 
> > --
> > regards
> > Yang, Sheng
> > 
> > > >  static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> > > >  
> > > >                                        struct kvm_assigned_msix_entry 
> > > > *entry)
> > > >  
> > > >  {
> > > > 
> > > > @@ -688,6 +714,19 @@ static int kvm_vm_ioctl_set_msix_entry(struct
> > > > kvm *kvm,
> > > > 
> > > >                         adev->guest_msix_entries[i].entry = 
> > > > entry->entry;
> > > >                         adev->guest_msix_entries[i].vector = entry->gsi;
> > > >                         adev->host_msix_entries[i].entry = entry->entry;
> > > > 
> > > > +                       if ((entry->flags & KVM_MSIX_FLAG_MASK) &&
> > > > +                                       
> > > > !(adev->guest_msix_entries[i].flags &
> > > > +                                       KVM_ASSIGNED_MSIX_MASK)) {
> > > > +                               adev->guest_msix_entries[i].flags |=
> > > > +                                       KVM_ASSIGNED_MSIX_MASK;
> > > > +                               update_msix_mask(adev, i);
> > > > +                       } else if (!(entry->flags & KVM_MSIX_FLAG_MASK) 
> > > > &&
> > > > +                                       
> > > > (adev->guest_msix_entries[i].flags &
> > > > +                                       KVM_ASSIGNED_MSIX_MASK)) {
> > > > +                               adev->guest_msix_entries[i].flags &=
> > > > +                                       ~KVM_ASSIGNED_MSIX_MASK;
> > > > +                               update_msix_mask(adev, i);
> > > > +                       }
> > > > 
> > > >                         break;
> > > >                 
> > > >                 }
> > > >         
> > > >         if (i == adev->entries_nr) {
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to