>>>>> "christoph" == Christoph Best <[EMAIL PROTECTED]> writes:
Hi
christoph> I am having a problem benchmarking the L2 cache performance on some
christoph> Alpha 21264 systems from our clusters and wondering if anybody else
christoph> has seen this. We use a benchmark that models the kernel of our main
christoph> application (computational physics/lattice gauge theory). When running
christoph> in L1 cache or beyond L2 cache, it gives perfectly consistent readings
christoph> with deviations of 1% or less. But in L2 cache, the numbers from
christoph> different runs may be off by as much as 20%, for which I cannot find a
christoph> good explanation. If I plot performance vs. memory footprint, there is
christoph> a clear shoulder from the L1 cache (64 KB), but then a kind of
christoph> logarithmic behavior (double the memory use loses 30 MFlops).
christoph> The benchmark consists of a completely deterministic set of
christoph> floating-point operations, and I use a version that accesses memory
christoph> completely consecutively. The systems are Compaq DS10 (466 MHz single
christoph> proc.), ES40 (666 MHz 4-proc.), and API UP2000 (666 MHz dbl. proc.)
christoph> under Linux. I did not see this effect under Tru64 on a XP1000 (666
christoph> MHz single proc.).
christoph> The question is: Is there anything either in Linux or the 21264 that
christoph> could account for such behavior? Could the cache be polluted by other
christoph> processes that effectively? (The machines were basically idle during
christoph> benchmarks).
christoph> In particular, it seems that code running just inside the L2 cache (4
christoph> MB on the UP2000 and ES40) is not performing much better than code in
christoph> main memory, which would be a pity. We expect cache performance to be
christoph> a major determinant of total performance for our application: in L1
christoph> cache, the performance is about 600 MFlops, outside L2 cache it drops
christoph> to about 200 MFlops. Inside L2 it varies between 300 and 450 MFlops.
<wild guess>
As you point that in True64 you don't see the problem, can the problem
be related with the _lack_ of page colouring on Linux. I don't know
by heart what is the associativity of the L2 caches in one Alpha, but
it can be very related.
</wild guess>
Later, Juanp
--
In theory, practice and theory are the same, but in practice they
are different -- Larry McVoy