On Thu, 2005-04-14 at 17:47, Philippe Gerum wrote:
> Fillod Stephane wrote:
>
> > I keep on hearing people are having feeling that their latency
> > can be caused by TLB misses/cache refills, but never seen proof.
> > Is there some literature about that subject? Nobody in the RTAI
> > community had curiosity to explain and fix this interesting problem?
>
> AFAIC, the curiosity is there, and better understanding the caching
> behaviour of the nucleus is planned before fusion turns 1.0; after all,
> the core can run inside a regular Linux process so we could even use
> cachegrind for this. The same goes for Adeos, except that cachegrind is
> obviously out of reach, so the usual tough way is currently followed,
> when time allows.
>
> For instance, this explains why the CONFIG_ADEOS_NOTHREADS came into
> play in recent Adeos releases, but with limited success, since the cost
> of switching domain stacks on low-end machines (Pentium 90Mhz-based
> slug, Geode/x86 266 and IceCube/ppc) was apparently not worth the effort
> of coding up this mode. On mid-range to high-end boxen,
> the perceived benefits so far are nil, except perhaps that you don't
> have to fiddle
> with non-Linux allocated stacks inside your interrupt handlers (e.g.
> "current" determination hack for x86). Maybe other have had better
> results trying a similar approach on other archs (Michael, with ARM?), I
Non-threaded Adeos helps a little on ARM, but the gain is nothing
compared to the penalty created by the way the caches work on ARM: as
virtual addresses are used to access the cache, it is necessary to flush
it completely *every* time a different process is switched in. This can
be demonstrated by running a simple test program like the following in
parallel to a real-time Adeos domain:
main() {
fork();
while (1)
sched_yield();
}
Worst-case latencies are achieved really quick with this setup :-)
Things are even worse if the dcache is configured for write-back:
interrupts have to be disabled during the write-back (switch_mm() call
in schedule()) and that adds 70 us to the worst-case latency on a 166
MHz ARM9 CPU (depends also on the RAM speed of course). You can get rid
of this by using write-through caching, but that decreases the
average-case performance.
The only solution (I have found) to the cold-cache-after-process-switch
problem would be to use MMU-less uClinux (see
http://www.linuxdevices.com/articles/AT2598317046.html)
or a scheme like FASS (see
http://www.disy.cse.unsw.edu.au/Software/FASS/) but both have their
disadvantages.
Mike
--
Dr. Michael Neuhauser phone: +43 1 789 08 49 - 30
Firmix Software GmbH fax: +43 1 789 08 49 - 55
Vienna/Austria/Europe email: [EMAIL PROTECTED]
Embedded Linux Development and Services http://www.firmix.at/