Thank you for commenting on my notes.
On Sat, 2004-12-18 at 13:55, Philippe Gerum wrote:
> On Mon, 2004-12-13 at 15:02, Michael Neuhauser wrote:
> > [...]
> > 1) idle loop (this may affect other architectures than ARM too)
> > ---------------------------------------------------------------
> >
> > If the CPU (like the EP9301) has a special HALT state and this feature
> > is used in the idle loop (in arch/arm/kernel/process.c, via arch_idle())
> >
> > void default_idle(void)
> > {
> > local_irq_disable();
> > if (!current->need_resched)
> > arch_idle();
> > local_irq_enable();
> > }
> >
> > then the following can happen when Adeos is used: local_irq_disable()
> > stalls the pipeline - interrupts that happen after this call and before
> > arch_idle() are handled by Adeos (put into pipe, mask in PIC) but
> > nothing more. If, by chance, *all* active interrupts happen just in this
> > time window, then the system is going to be locked-up because the HALT
> > state can not be left again:
> > * only an interrupt could wake-up the CPU from the HALT state
> > * all interrupts are masked and will not be delivered to the CPU
> >
> > This actually happened quite often on my system: everything
> > froze until I sent a character to the otherwise idle 2nd UART (i.e.
> > triggered an unmasked interrupt).
> >
> > To fix this, disable the interrupts on the hardware level in
> > default_idle(), not just stall the pipe (i.e. adeos_hw_cli() instead of
> > local_irq_disable()) if Adeos is configured.
>
> This has been overlooked in the Adeos patch for ARM. x86 properly
> flushes the interrupt log when calling safe_halt(), but locking out hw
> IRQs seems indeed the best way to prevent the CPU to enter the halted
> mode whilst there is some IRQ-awaken task to schedule, specifically for
> non-preemptible kernels.
As I understand it, the problem has nothing to do with flushing the
interrupt log or tasks not being scheduled. There is a lock-up on the
hardware level: all interrupts that could force the CPU from HALT are
masked -> HALT state is never left again. It's not likely to happen but
it does happen on my (slow) system (depends on interrupt load etc.).
> > Maybe using the HALT state should be avoided at all, when hard real-time
> > performance is the goal. [...]
>
> Power consumption?
I imagine some people will gladly accept increased power consumption if
it shaves off some us from the latency (if this is really the case for
some ARM hardware, it is not with mine). So this might not be a good
idea for default behaviour.
> >[...]
> > 3) crash when Adeos is compiled into kernel (i.e. not module)
> >[...]
>
> The location in the boot sequence where the pipeline should be enabled
> looks like much more arch-dependent than I had first expected (e.g.
> Adeos 2.6/ia64 required some changes there). This said, the crash might
> be related to the change making adp_pipelined a constant. In any case,
> this is what happened to the Adeos 2.6/PPC port.
I'll try it without making adp_pipelined a constant.
> > [...]
> > [return value of __adeos_handle_irq() is not used, don't compute it]
> > [...]
>
> Whoops, no. The real problem is the IRQ return path not caring from the
> return value of __adeos_handle_irq(), not the other way around. The
> current patch has a serious flaw, it should check this return value, and
> branch out of any Linux epilogue code if it is non-zero.
>
> The reason is that if this value is non-zero, then the current IRQ has
> either preempted a non-Linux domain, or has preempted Linux (e.g. to
> serve more prioritary domains, or at least to log the interrupt event)
> while the Linux pipeline stage was stalled, i.e. while Linux thought it
> was running in an interrupt-free section.
> If you do not prevent the Linux epilogue code to run upon IRQ exit in
> the latter case, then you are likely to execute it in a spurious
> context. This issue is even more important with preemptible kernels,
> because the epilogue code does even more complex things.
After spending a weekend buried in the entry-*.S stuff it is quite
obvious to me too. I have a preliminary patch (but it needs more
testing).
I think there are more problems with the exception handling stuff: the
stall-state of the root domain's ipipe is not preserved across the
exception. On i386 this is done by __adeos_if_fixup_root() and
__adeos_unstall_iret_root() but nothing the like exists for ARM.
Do you know if this was omitted accidentally or intentionally (i.e.
because it is not necessary on ARM, but I don't see how this could be
the case)?
> > P4) IPIPE_SYNC_FLAG only needed for SMP
> > ---------------------------------------
> >
> > * arch/arm/kernel/adeos.c:__adeos_sync_stage() is the only place where
> > the IPIPE_SYNC_FLAG of a pipeline is modified.
> > * the flag is only modified when adeos_lock_cpu() or adeos_hw_cli() was
> > called before
> > * the flag is cleared before interrupts are hard enabled and the handler
> > is called
> > -> test_and_set_bit() on the sync-flag can never be true on an UP system
> > -> IPIPE_SYNC_FLAG is only needed for SMP
>
> In fact, the need for this flag originates from x86, where some drivers
> (e.g. PC keyboard one) do sit in a busy loop over an ISR waiting or the
> next interrupt to come from the device they dialog with (yes, it's
> ugly), instead of using some kind of automaton to deal with this
> asynchronously from the kernel execution POV. For this to work, then you
> need to be able to re-enter __adeos_sync_stage(). So it's actually
> needed on UP too for some hw.
>
> I cannot tell for ARM, but since __adeos_sync_stage() is arch-dependent,
> I guess it's ok to remove it if there is no driver problem such as the
> one plaguing x86.
I think the difference between ARM & i386 is not the driver's behaviour
but the fact that ARM __adeos_sync_stage() does an adeos_lock_cpu()
(i386 doesn't) before setting the sync-flag and the sync-flag is cleared
before hw-interrupts are enabled again. So it is impossible (on UP) that
__adeos_sync_stage() is entered with sync-flag set (as long as nobody
else is modifying the flag, which doesn't seem to be the case).
Regards
Mike
--
Dr. Michael Neuhauser phone: +43 1 789 08 49 - 30
Firmix Software GmbH fax: +43 1 789 08 49 - 55
Vienna/Austria/Europe email: [EMAIL PROTECTED]
Embedded Linux Development and Services http://www.firmix.at/