Am 17.12.2010 16:25, Thomas Gleixner wrote:
> On Fri, 17 Dec 2010, Jan Kiszka wrote:
> 
>> Am 17.12.2010 11:41, Thomas Gleixner wrote:
>>> On Fri, 17 Dec 2010, Jan Kiszka wrote:
>>>> Am 17.12.2010 11:23, Thomas Gleixner wrote:
>>>>> OTOH, if we have to disable anyway, then we could simply keep it
>>>>> disabled across the installation of a new handler. That would make the
>>>>> notification business go away, wouldn't it ?
>>>>
>>>> No, the notification is still necessary in case the registered handler
>>>> keeps the line off after returning from both hard and threaded handler.
>>>
>>> And how should that happen? If it is in oneshot mode then the line is
>>> reenabled when the thread handler returns.
>>
>> disable_irq_nosync is called by the handler before returning. And it's
>> the handler's job to revert this, properly synchronizing it internally.
> 
> disable_irq_nosync() is really the worst thing to do. That's simply
> not going to work without a lot of fuglyness.
> 
> What about the following:
> 
> primary_handler(....)
> {
>       if (!shared)
>               return IRQ_WAKE_THREAD;
> 
>       spin_lock(dev->irq_lock);
> 
>       if (from_my_device() || dev->irq_thread_waiting) {
>               mask_dev();
>               dev->masked = true;
>               ret = IRQ_WAKE_THREAD;
>       } else
>               ret = IRQ_NONE;
> 
>       spin_unlock(dev->irq_lock);
>       return ret;
> }
> 
> check_timeout()
> {
>       if (dev->irq_active && wait_longer())
>               return IRQ_WAKE_THREAD;
>       return IRQ_HANDLED;
> }
> 
> unmask_dev_if_necessary()
> {
>       if (dev->masked && dev->irq_active)
>               umask_dev();
> }
> 
> threaded_handler(....)
> {
>       if (!dev->irq_thread_waiting) {
>               spin_lock_irq(dev->irq_lock);
>               wake_user = do_magic_stuff_with_the_dev();
>               dev->irq_thread_waiting = wake_user;
>               spin_unlock(dev->irq_lock);
>               if (wake_user)
>                       wake_up(user);
>       }
> 
>       if (!dev->irq_thread_waiting) {
>               spin_lock_irq(dev->irq_lock);
>               unmask_dev_if_necessary();
>               spin_unlock(dev->irq_lock);
>               return IRQ_HANDLED;
>       }
> 
>       /*
>        * Wait for user space to complete. Timeout is to
>        * avoid starvation of the irq line when
>        * something goes wrong
>        */
>       wait_for_completion_timeout(dev->compl, SENSIBLE_TIMEOUT);
> 
>       spin_lock_irq(dev->irq_lock);
>       if (timedout) {
>               mask_dev();
>               dev->masked = true;
>               /*
>                * Leave dev->irq_thread_waiting untouched and let
>                * the core code reschedule us when check_timeout
>                * decides it's worth to wait. In any case we leave
>                * the device masked at the device level, so we don't
>                * cause an interrupt storm.
>                */
>               ret = check_timeout();
>       } else {
>               unmask_dev_if_necessary();
>               dev->irq_thread_waiting = false;
>               ret = IRQ_HANDLED;
>       }
>       spin_unlock(dev->irq_lock);
>       return ret;
> }
> 
> userspace_complete()
> {
>       complete(dev->irq_compl);
> }
> 
> Your aproach with disable_irq_nosync() is completely flawed, simply
> because you try to pretend that your interrupt handler is done, while
> it is not done at all. The threaded interrupt handler is done when
> user space completes. Everything else is just hacking around the
> problem and creates all that nasty transitional problems.

disable_irq_nosync is the pattern currently used in KVM, it's nothing
new in fact.

The approach looks interesting but requires separate code for
non-PCI-2.3 devices, i.e. when we have no means to mask at device level.
Further drawbacks - unless I missed something on first glance:

- prevents any future optimizations that would work without IRQ thread
  ping-pong (ie. once we allow guest IRQ injection from hardirq context
  for selected but typical setups)
- two additional, though light-weight, context switches on each
  interrupt completion
- continuous polling if user space decides to leave the interrupt
  unhandled (e.g. because the virtual IRQ line is masked)

Maybe the latter can be solved in a nicer way, but I don't think we can
avoid the first two. I'm not saying yet that they are killing this
approach, we just need to asses their relevance.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to