On Thu, 7 Jan 2010, Michael S. Tsirkin wrote:
> Sure, I was trying to be as brief as possible, here's a detailed summary.
>
> Description of the system (MSI emulation in KVM):
>
> KVM supports an ioctl to assign/deassign an eventfd file to interrupt message
> in guest OS. When this eventfd is signalled, interrupt message is sent.
> This assignment is done from qemu system emulator.
>
> eventfd is signalled from device emulation in another thread in
> userspace or from kernel, which talks with guest OS through another
> eventfd and shared memory (possibility of out of process was discussed
> but never got implemented yet).
>
> Note: it's okay to delay messages from correctness point of view, but
> generally this is latency-sensitive path. If multiple identical messages
> are requested, it's okay to send a single last message, but missing a
> message altogether causes deadlocks. Sending a message when none were
> requested might in theory cause crashes, in practice doing this causes
> performance degradation.
>
> Another KVM feature is interrupt masking: guest OS requests that we
> stop sending some interrupt message, possibly modified mapping
> and re-enables this message. This needs to be done without
> involving the device that might keep requesting events:
> while masked, message is marked "pending", and guest might test
> the pending status.
>
> We can implement masking in system emulator in userspace, by using
> assign/deassign ioctls: when message is masked, we simply deassign all
> eventfd, and when it is unmasked, we assign them back.
>
> Here's some code to illustrate how this all works: assign/deassign code
> in kernel looks like the following:
>
>
> this is called to unmask interrupt
>
> static int
> kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
> {
> struct _irqfd *irqfd, *tmp;
> struct file *file = NULL;
> struct eventfd_ctx *eventfd = NULL;
> int ret;
> unsigned int events;
>
> irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
>
> ...
>
> file = eventfd_fget(fd);
> if (IS_ERR(file)) {
> ret = PTR_ERR(file);
> goto fail;
> }
>
> eventfd = eventfd_ctx_fileget(file);
> if (IS_ERR(eventfd)) {
> ret = PTR_ERR(eventfd);
> goto fail;
> }
>
> irqfd->eventfd = eventfd;
>
> /*
> * Install our own custom wake-up handling so we are notified via
> * a callback whenever someone signals the underlying eventfd
> */
> init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
> init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
>
> spin_lock_irq(&kvm->irqfds.lock);
>
> events = file->f_op->poll(file, &irqfd->pt);
>
> list_add_tail(&irqfd->list, &kvm->irqfds.items);
> spin_unlock_irq(&kvm->irqfds.lock);
>
> A.
> /*
> * Check if there was an event already pending on the eventfd
> * before we registered, and trigger it as if we didn't miss it.
> */
> if (events & POLLIN)
> schedule_work(&irqfd->inject);
>
> /*
> * do not drop the file until the irqfd is fully initialized, otherwise
> * we might race against the POLLHUP
> */
> fput(file);
>
> return 0;
>
> fail:
> ...
> }
What is you do (under proper irqfd locking) something like:
eventfd_ctx_read(ctx, 1, &cnt);
if (irqfd->cnt != cnt) {
irqfd->cnt = cnt;
schedule_work(&irqfd->inject);
}
> And deactivation deep down does this (from irqfd_cleanup_wq workqueue,
> so this is not under the spinlock):
>
> /*
> * Synchronize with the wait-queue and unhook ourselves to
> * prevent
> * further events.
> */
> B.
> remove_wait_queue(irqfd->wqh, &irqfd->wait);
>
> ....
>
> /*
> * It is now safe to release the object's resources
> */
> eventfd_ctx_put(irqfd->eventfd);
> kfree(irqfd);
And:
eventfd_ctx_read(ctx, 1, &irqfd->cnt);
remove_wait_queue(irqfd->wqh, &irqfd->wait);
- Davide
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html