On Tue, 2004-03-30 at 11:51, Simon Marlow wrote: > I've done some cache profiling of GHC's code myself, and Nick Nethercote > did some very detailed measurements a while back (see his recent post > for details). > > The upshot of what he found is that we could benefit from some > prefetching, perhaps on the order of 10-20%. Particularly prefetching > in the allocation area during evaluation, to ensure that memory about to > be written to is in the cache, and similar techniques during GC could > help. However, actually taking advantage of this is quite hard - > prefetching instructions aren't standard, and even when they are getting > any benefit can depend on cache architecture and other effects which > vary between processor families. Getting things wrong often results in > a slowdown. It's just too brittle.
When compiling via gcc, there's the __builtin_prefetch function: http://gcc.gnu.org/onlinedocs/gcc-3.3.3/gcc/Other-Builtins.html (about 2/3rds of the way down the page) It provides semi-portable prefetching on supported targets. That is cpu's that support prefetch instructions with some sane common semantics (non-faulting etc). It has optional parameters to control read/write and expected locality. see also: http://gcc.gnu.org/projects/prefetch.html I didn't read it very carefully, but it's not clear (on the x86 cpus) if there is a prefetch instruction that is common between the amd & intel flavours, ie would a ghc binary with prefetch be portable between P3/P4/Athlon. Of course for ghc's native code generator, you're on your own. :-( On the other hand, it might not be necessary to generate prefetch instructions, much of the speedup might be obtainable but just adding them in the allocator & gc parts of the rts. Since the rts is written in GNU C, you could use gcc's prefetch function. Duncan _______________________________________________ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
