On 2012-02-28 00:15, Alex Williamson wrote: > On Mon, 2012-02-27 at 23:07 +0100, Jan Kiszka wrote: >> On 2012-02-27 22:05, Alex Williamson wrote: >>> On Fri, 2012-02-10 at 19:17 +0100, Jan Kiszka wrote: >>>> PCI 2.3 allows to generically disable IRQ sources at device level. This >>>> enables us to share legacy IRQs of such devices with other host devices >>>> when passing them to a guest. >>>> >>>> The new IRQ sharing feature introduced here is optional, user space has >>>> to request it explicitly. Moreover, user space can inform us about its >>>> view of PCI_COMMAND_INTX_DISABLE so that we can avoid unmasking the >>>> interrupt and signaling it if the guest masked it via the virtualized >>>> PCI config space. >>>> >>>> Signed-off-by: Jan Kiszka <[email protected]> >>>> --- >>>> >>>> Changes in v3: >>>> - rebased over current kvm.git (no code conflict, just api.txt) >>>> >>>> Documentation/virtual/kvm/api.txt | 31 ++++++ >>>> arch/x86/kvm/x86.c | 1 + >>>> include/linux/kvm.h | 6 + >>>> include/linux/kvm_host.h | 2 + >>>> virt/kvm/assigned-dev.c | 208 >>>> +++++++++++++++++++++++++++++++----- >>>> 5 files changed, 219 insertions(+), 29 deletions(-) >>>> >>>> diff --git a/Documentation/virtual/kvm/api.txt >>>> b/Documentation/virtual/kvm/api.txt >>>> index 59a3826..5ce0e29 100644 >>>> --- a/Documentation/virtual/kvm/api.txt >>>> +++ b/Documentation/virtual/kvm/api.txt >>>> @@ -1169,6 +1169,14 @@ following flags are specified: >>>> >>>> /* Depends on KVM_CAP_IOMMU */ >>>> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) >>>> +/* The following two depend on KVM_CAP_PCI_2_3 */ >>>> +#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) >>>> +#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2) >>>> + >>>> +If KVM_DEV_ASSIGN_PCI_2_3 is set, the kernel will manage legacy INTx >>>> interrupts >>>> +via the PCI-2.3-compliant device-level mask, thus enable IRQ sharing with >>>> other >>>> +assigned devices or host devices. KVM_DEV_ASSIGN_MASK_INTX specifies the >>>> +guest's view on the INTx mask, see KVM_ASSIGN_SET_INTX_MASK for details. >>>> >>>> The KVM_DEV_ASSIGN_ENABLE_IOMMU flag is a mandatory option to ensure >>>> isolation of the device. Usages not specifying this flag are deprecated. >>>> @@ -1441,6 +1449,29 @@ The "num_dirty" field is a performance hint for KVM >>>> to determine whether it >>>> should skip processing the bitmap and just invalidate everything. It must >>>> be set to the number of set bits in the bitmap. >>>> >>>> +4.60 KVM_ASSIGN_SET_INTX_MASK >>>> + >>>> +Capability: KVM_CAP_PCI_2_3 >>>> +Architectures: x86 >>>> +Type: vm ioctl >>>> +Parameters: struct kvm_assigned_pci_dev (in) >>>> +Returns: 0 on success, -1 on error >>>> + >>>> +Informs the kernel about the guest's view on the INTx mask. As long as the >>>> +guest masks the legacy INTx, the kernel will refrain from unmasking it at >>>> +hardware level and will not assert the guest's IRQ line. User space is >>>> still >>>> +responsible for applying this state to the assigned device's real config >>>> space >>>> +by setting or clearing the Interrupt Disable bit 10 in the Command >>>> register. >>>> + >>>> +To avoid that the kernel overwrites the state user space wants to set, >>>> +KVM_ASSIGN_SET_INTX_MASK has to be called prior to updating the config >>>> space. >>>> +Moreover, user space has to write back its own view on the Interrupt >>>> Disable >>>> +bit whenever modifying the Command word. >>> >>> This is very confusing. I think we need to work on the wording, but >>> perhaps it's not worth hold up the patch. It seems the simplest, >> >> As I need another round anyway (see below), I'm open for better wording >> suggestions. > > Now that I know what it does, I'll probably write something just as > confusing, but here's a shot: > > Allows userspace to mask PCI INTx interrupts from the assigned > device. The kernel will not deliver INTx interrupts to the > guest between setting and clearing of KVM_ASSIGN_SET_INTX_MASK > via this interface. This enables use of and emulation of PCI > 2.3 INTx disable command register behavior. > > This may be used for both PCI 2.3 devices supporting INTx > disable natively and older devices lacking this support. > Userspace is responsible for emulating the read value of the > INTx disable bit in the guest visible PCI command register. > When modifying the INTx disable state, userspace should precede > updating the physical device command register by calling this > ioctl to inform the kernel of the new intended INTx mask state. > > Note that the kernel uses the device INTx disable bit to > internally manage the device interrupt state for PCI 2.3 > devices. Reads of this register may therefore not match the > expected value. Writes should always use the guest intended > INTx disable value rather than attempting to read-copy-update > the current physical device state. Races between user and > kernel updates to the INTx disable bit are handled lazily in the > kernel. It's possible the device may generate unintended > interrupts, but they will not be injected into the guest.
Sounds good, will pick it up.
...
>>>> @@ -759,6 +851,56 @@ msix_entry_out:
>>>> }
>>>> #endif
>>>>
>>>> +static int kvm_vm_ioctl_set_pci_irq_mask(struct kvm *kvm,
>>>> + struct kvm_assigned_pci_dev *assigned_dev)
>>>> +{
>>>> + int r = 0;
>>>> + struct kvm_assigned_dev_kernel *match;
>>>> +
>>>> + mutex_lock(&kvm->lock);
>>>> +
>>>> + match = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,
>>>> + assigned_dev->assigned_dev_id);
>>>> + if (!match) {
>>>> + r = -ENODEV;
>>>> + goto out;
>>>> + }
>>>> +
>>>> + mutex_lock(&match->intx_mask_lock);
>>>> +
>>>> + match->flags &= ~KVM_DEV_ASSIGN_MASK_INTX;
>>>> + match->flags |= assigned_dev->flags & KVM_DEV_ASSIGN_MASK_INTX;
>>>> +
>>>> + if (match->irq_requested_type & KVM_DEV_IRQ_GUEST_INTX) {
>>>> + if (assigned_dev->flags & KVM_DEV_ASSIGN_MASK_INTX) {
>>>> + kvm_set_irq(match->kvm, match->irq_source_id,
>>>> + match->guest_irq, 0);
>>>> + /*
>>>> + * Masking at hardware-level is performed on demand,
>>>> + * i.e. when an IRQ actually arrives at the host.
>>>> + */
>>>> + } else {
>>>> + /*
>>>> + * Unmask the IRQ line. It may have been masked
>>>> + * meanwhile if we aren't using PCI 2.3 INTx masking
>>>> + * on the host side.
>>>> + */
>>>> + spin_lock_irq(&match->intx_lock);
>>>> + if (match->host_irq_disabled) {
>>>> + enable_irq(match->host_irq);
>>>
>>> How do we not get an unbalanced enable here for PCI 2.3 devices?
>>
>> By performing both the disable and the host_irq_disabled update under
>> intx_lock? Or which scenario do you see?
>
> Do we ever do disable_irq() on a PCI 2.3 device? Seems like we only use
> the intx API for them, and disable/enable_irq exclusively on non-2.3, so
> if we can get here on a 2.3 device we have unbalanced enables. Am I
> missing some nuance of host_irq_disabled that prevents 2.3 from getting
> here? Thanks,
OK, now I got it. Yes, that's indeed a bug. I dug in an older version of
this patch, and there I had multiple state values to encode if the line
or the device was disabled. Will limit to pre-2.3 devices.
Thanks,
Jan
signature.asc
Description: OpenPGP digital signature
