On Fri, 2010-10-08 at 10:41 +0200, Jan Kiszka wrote: > Am 08.10.2010 10:17, Philippe Gerum wrote: > > On Fri, 2010-10-08 at 09:01 +0200, Pavel Machek wrote: > >> Hi! > >> > >>>> I have... quite an interesting setup here. > >>>> > >>>> SMP machine, with special PCI card; that card has GPIOs and serial > >>>> ports. Unfortunately, there's only one interrupt, shared between > >>>> serials and GPIO pins, and serials are way too complex to be handled > >>>> by realtime layer. > >>>> > >>>> So I ended up with > >>>> > >>>> // we also have an interrupt handler: > >>>> > >>>> > >>>> ret = rtdm_irq_request(&my_context->irq_handle, > >>>> gpio_rt_config.irq, demo_interrupt, > >>>> RTDM_IRQTYPE_SHARED, > >>>> context->device->proc_name, my_context); > >>>> > >>>> and > >>>> > >>>> static int demo_interrupt(rtdm_irq_t *irq_context) > >>>> { > >>>> struct demodrv_context *ctx; > >>>> int dev_id; > >>>> int ret = RTDM_IRQ_HANDLED; // usual return value > >>>> > >>>> > >>>> unsigned pending, output; > >>>> > >>>> ctx = rtdm_irq_get_arg(irq_context, struct demodrv_context); > >>>> dev_id = ctx->dev_id; > >>>> > >>>> if (!ctx->ready) { > >>>> printk(KERN_CRIT "Unexpected interrupt\n"); > >>>> return XN_ISR_PROPAGATE; > >>> > >>> Who sets ready and when? Looks racy. > >> > >> Debugging aid; yes, this one is racy. > >> > >>>> rtdm_lock_put(&ctx->lock); > >>>> > >>>> /* We need to propagate the interrupt, so that PMC-6L serials > >>>> > >>>> > >>>> work. Result is that interrupt latencies can't be > >>>> > >>>> > >>>> guaranteed when serials are in use. */ > >>>> > >>>> return RTDM_IRQ_HANDLED; > >>>> } > >>>> > >>>> Unregistration is: > >>>> my_context->ready = 0; > >>>> rtdm_irq_disable(&my_context->irq_handle); > >>> > >>> Where is rtdm_irq_free? Again, this ready flag looks racy. > >> > >> Aha, sorry, I quoted wrong snippet. rtdm_irq_free() follows > >> immediately, like this: > >> > >> int demo_close_rt(struct rtdm_dev_context *context, > >> rtdm_user_info_t *user_info) > >> { > >> struct demodrv_context *my_context; > >> rtdm_lockctx_t lock_ctx; > >> // get the context > >> > >> > >> my_context = (struct demodrv_context *)context->dev_private; > >> > >> // if we need to do some stuff with preemption disabled: > >> > >> > >> rtdm_lock_get_irqsave(&my_context->lock, lock_ctx); > >> > >> my_context->ready = 0; > >> rtdm_irq_disable(&my_context->irq_handle); > >> > >> > >> // free irq in RTDM > >> > >> > >> rtdm_irq_free(&my_context->irq_handle); > >> > >> // destroy our interrupt signal/event > >> > >> > >> rtdm_event_destroy(&my_context->irq_event); > >> > >> // other stuff here > >> > >> > >> rtdm_lock_put_irqrestore(&my_context->lock, lock_ctx); > >> > >> return 0; > >> } > >> > >> Now... I'm aware that lock_get/put around irq_free should be > >> unneccessary, as should be irq_disable and my ->ready flag. Those were > >> my attempts to work around the problem. I'll attach the full source at > >> the end. > >> > >>>> Unfortunately, when the userspace app is ran and killed repeatedly (so > >>>> that interrupt is registered/unregistered all the time), I get > >>>> oopses in __ipipe_dispatch_wired() -- it seems to call into the NULL > >>>> pointer. > >>>> > >>>> I decided that "wired" interrupt when the source is shared between > >>>> Linux and Xenomai, is wrong thing, so I disable "wired" interrupts > >>>> altogether, but that only moved oops to __virq_end. > >>> > >>> This is wrong. The only way to get a determistically shared IRQs across > >>> domains is via the wired path, either using the pattern Gilles cited or, > >>> in a slight variation, signaling down via a separate rtdm_nrtsig. > >> > >> For now, I'm trying to get it not to oops; deterministic latencies are > >> the next topic :-(. > > > > The main issue is that we don't lock our IRQ descriptors (the pipeline > > ones) when running the handlers, so another CPU clearing them via > > ipipe_virtualize_irq() may well sink the boat... > > > > The unwritten rule has always been to assume that drivers would stop > > _and_ drain interrupts on all CPUs before unregistering handlers, then > > exiting the code. Granted, that's a bit much. > > IIRC, we drain at nucleus-level if statistic are enabled. I guess we > should make this unconditional.
Draining is currently performed after the descriptor release via rthal_irq_release() in this code, and it depends on the stat counters to determine whether the IRQ handler is still running on any CPU it seems. A saner way would be to define a draining service in the pipeline, and have rtdm_irq_free() invoke it early. -- Philippe. _______________________________________________ Xenomai-help mailing list Xenomai-help@gna.org https://mail.gna.org/listinfo/xenomai-help