On 25.04.2013, at 21:03, Scott Wood wrote:

> On 04/25/2013 09:49:23 AM, Alexander Graf wrote:
>> On 25.04.2013, at 13:30, Alexander Graf wrote:
>> >
>> > On 19.04.2013, at 20:51, Scott Wood wrote:
>> >
>> >> On 04/19/2013 09:06:27 AM, Alexander Graf wrote:
>> >>> Now that all pieces are in place for reusing generic irq infrastructure,
>> >>> we can copy x86's implementation of KVM_IRQ_LINE irq injection and simply
>> >>> reuse it for PPC, as it will work there just as well.
>> >>> Signed-off-by: Alexander Graf <ag...@suse.de>
>> >>> ---
>> >>> arch/powerpc/include/uapi/asm/kvm.h |    1 +
>> >>> arch/powerpc/kvm/powerpc.c          |   13 +++++++++++++
>> >>> 2 files changed, 14 insertions(+), 0 deletions(-)
>> >>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
>> >>> b/arch/powerpc/include/uapi/asm/kvm.h
>> >>> index 3537bf3..dbb2ac2 100644
>> >>> --- a/arch/powerpc/include/uapi/asm/kvm.h
>> >>> +++ b/arch/powerpc/include/uapi/asm/kvm.h
>> >>> @@ -26,6 +26,7 @@
>> >>> #define __KVM_HAVE_SPAPR_TCE
>> >>> #define __KVM_HAVE_PPC_SMT
>> >>> #define __KVM_HAVE_IRQCHIP
>> >>> +#define __KVM_HAVE_IRQ_LINE
>> >>> struct kvm_regs {
>> >>>  __u64 pc;
>> >>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
>> >>> index c431fea..874c106 100644
>> >>> --- a/arch/powerpc/kvm/powerpc.c
>> >>> +++ b/arch/powerpc/kvm/powerpc.c
>> >>> @@ -33,6 +33,7 @@
>> >>> #include <asm/cputhreads.h>
>> >>> #include <asm/irqflags.h>
>> >>> #include "timing.h"
>> >>> +#include "irq.h"
>> >>> #include "../mm/mmu_decl.h"
>> >>> #define CREATE_TRACE_POINTS
>> >>> @@ -945,6 +946,18 @@ static int kvm_vm_ioctl_get_pvinfo(struct 
>> >>> kvm_ppc_pvinfo *pvinfo)
>> >>>  return 0;
>> >>> }
>> >>> +int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level 
>> >>> *irq_event,
>> >>> +                          bool line_status)
>> >>> +{
>> >>> +        if (!irqchip_in_kernel(kvm))
>> >>> +                return -ENXIO;
>> >>> +
>> >>> +        irq_event->status = kvm_set_irq(kvm, 
>> >>> KVM_USERSPACE_IRQ_SOURCE_ID,
>> >>> +                                        irq_event->irq, 
>> >>> irq_event->level,
>> >>> +                                        line_status);
>> >>> +        return 0;
>> >>> +}
>> >>
>> >> As Paul noted in the XICS patchset, this could reference an MPIC that has 
>> >> gone away if the user never attached any vcpus and then closed the MPIC 
>> >> fd.  It's not a reasonable use case, but it could be used malicously to 
>> >> get the kernel to access a bad pointer.  The irqchip_in_kernel check 
>> >> helps somewhat, but it's meant for ensuring that the creation has 
>> >> happened -- it's racy if used for ensuring that destruction hasn't 
>> >> happened.
>> >>
>> >> The problem is rooted in the awkwardness of performing an operation that 
>> >> logically should be on the MPIC fd, but is instead being done on the vm 
>> >> fd.
>> >>
>> >> I think these three steps would fix it (the first two seem like things we 
>> >> should be doing anyway):
>> >> - During MPIC destruction, make sure MPIC deregisters all routes that 
>> >> reference it.
>> >> - In kvm_set_irq(), do not release the RCU read lock until after the 
>> >> set() function has been called.
>> >> - Do not hook up kvm_send_userspace_msi() to MPIC or other new irqchips, 
>> >> as that bypasses the RCU lock.  It could be supported as a device fd 
>> >> ioctl if desired, or it could be reworked to operate on an RCU-managed 
>> >> list of MSI handlers, though MPIC really doesn't need this at all.
>> >
>> > Can't we just add an RCU lock in the send_userspace_msi case? I don't 
>> > think we should handle MSIs any differently from normal IRQs.
> 
> Well, you can't *just* add the RCU lock -- you need to add data to be managed 
> via RCU (e.g. a list of MSI callbacks, or at least a boolean indicating 
> whether calling the MSI code is OK).

Well, we'd just access a random pin routing :).

> 
>> In fact I'm having a hard time verifying that we're always accessing things 
>> with proper locks held. I'm pretty sure we're missing a few cases.
> 
> Any path in particular?

I'm already getting confused on whether normal MMIO accesses are always safe.

> 
>> So how about we delay mpic destruction to vm destruction? We simply add one 
>> user too many when we spawn the mpic and put it on vm_destruct. That way 
>> users _can_ destroy mpics, but they will only be really free'd once the vm 
>> is also gone.
> 
> That's what we originally had before the fd conversion.  If we want it again, 
> we'll need to go back to maintaining a list of devices in KVM (though it 
> could be a linked list now that we don't need to use it for lookups), or have 
> some hardcoded MPIC hack.

Well, we could have an anonymous linked list of device pointers with a simple 
registration function. That way it's generic enough for any device to be kept 
alive until vm destruction if it wants that.

> IIRC I said back then that converting to fd would make destruction ordering 
> more of a pain...

I usually like to pick the raisins from everything I can. So while I like the 
fd approach for its universally understandable scheme, simplicity of use, 
extensibility of ioctls etc, I don't really like the headaches that come with 
destroying a device while a VM is running. So having a device keep itself alive 
until the VM is gone is the best of all worlds :).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to