On Thu, 7 Jan 2010, Michael S. Tsirkin wrote:

> Sure, I was trying to be as brief as possible, here's a detailed summary.
> 
> Description of the system (MSI emulation in KVM):
> 
> KVM supports an ioctl to assign/deassign an eventfd file to interrupt message
> in guest OS.  When this eventfd is signalled, interrupt message is sent.
> This assignment is done from qemu system emulator.
> 
> eventfd is signalled from device emulation in another thread in
> userspace or from kernel, which talks with guest OS through another
> eventfd and shared memory (possibility of out of process was discussed
> but never got implemented yet).
> 
> Note: it's okay to delay messages from correctness point of view, but
> generally this is latency-sensitive path. If multiple identical messages
> are requested, it's okay to send a single last message, but missing a
> message altogether causes deadlocks.  Sending a message when none were
> requested might in theory cause crashes, in practice doing this causes
> performance degradation.
> 
> Another KVM feature is interrupt masking: guest OS requests that we
> stop sending some interrupt message, possibly modified mapping
> and re-enables this message. This needs to be done without
> involving the device that might keep requesting events:
> while masked, message is marked "pending", and guest might test
> the pending status.
> 
> We can implement masking in system emulator in userspace, by using
> assign/deassign ioctls: when message is masked, we simply deassign all
> eventfd, and when it is unmasked, we assign them back.
> 
> Here's some code to illustrate how this all works: assign/deassign code
> in kernel looks like the following:
> 
> 
> this is called to unmask interrupt
> 
> static int
> kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
> {
>       struct _irqfd *irqfd, *tmp;
>       struct file *file = NULL;
>       struct eventfd_ctx *eventfd = NULL;
>       int ret;
>       unsigned int events;
> 
>       irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
> 
> ...
> 
>       file = eventfd_fget(fd);
>       if (IS_ERR(file)) {
>               ret = PTR_ERR(file);
>               goto fail;
>       }
> 
>       eventfd = eventfd_ctx_fileget(file);
>       if (IS_ERR(eventfd)) {
>               ret = PTR_ERR(eventfd);
>               goto fail;
>       }
> 
>       irqfd->eventfd = eventfd;
> 
>       /*
>        * Install our own custom wake-up handling so we are notified via
>        * a callback whenever someone signals the underlying eventfd
>        */
>       init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
>       init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
> 
>       spin_lock_irq(&kvm->irqfds.lock);
> 
>       events = file->f_op->poll(file, &irqfd->pt);
> 
>       list_add_tail(&irqfd->list, &kvm->irqfds.items);
>       spin_unlock_irq(&kvm->irqfds.lock);
> 
> A.
>       /*
>        * Check if there was an event already pending on the eventfd
>        * before we registered, and trigger it as if we didn't miss it.
>        */
>       if (events & POLLIN)
>               schedule_work(&irqfd->inject);
> 
>       /*
>        * do not drop the file until the irqfd is fully initialized, otherwise
>        * we might race against the POLLHUP
>        */
>       fput(file);
> 
>       return 0;
> 
> fail:
>       ...
> }

What is you do (under proper irqfd locking) something like:

        eventfd_ctx_read(ctx, 1, &cnt);
        if (irqfd->cnt != cnt) {
                irqfd->cnt = cnt;
                schedule_work(&irqfd->inject);
        }




> And deactivation deep down does this (from irqfd_cleanup_wq workqueue,
> so this is not under the spinlock):
> 
>         /*
>          * Synchronize with the wait-queue and unhook ourselves to
>          * prevent
>          * further events.
>          */
> B.
>         remove_wait_queue(irqfd->wqh, &irqfd->wait);
> 
>       ....
> 
>         /*
>          * It is now safe to release the object's resources
>          */
>         eventfd_ctx_put(irqfd->eventfd);
>         kfree(irqfd);

And:

        eventfd_ctx_read(ctx, 1, &irqfd->cnt);
        remove_wait_queue(irqfd->wqh, &irqfd->wait);




- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to