>>>>> On Wed, 16 Mar 2005 11:58:17 +0100, Zoltan Menyhart <[EMAIL PROTECTED]> 
>>>>> said:

  Zoltan> I ran flush_icache_range() for 1000 times for the same page
  Zoltan> (i.e. the "fc" has really nothing to do).  The other CPUs
  Zoltan> were idle. No traffic on the bus.  I simply took the ITC
  Zoltan> value before and after...  Here are the values (average for
  Zoltan> the 1000 runs):

  Zoltan> With a 64-byte stride: 110143 nsec 187218 cycles
  Zoltan> With a 32-byte stride: 225606 nsec 383477 cycles

That's definitely a worthwhile improvement.  I re-checked and it turns
out that I misremembered what I measured: the test-case I had was
testing whether a better scheduled loop-body would help.  I think I
actually wrote that in the Merced days, so I couldn't even have tested
64-byte stride at that time.

I re-ran the test case now and got these results:

 page size   cache-line           stride
               state       32 bytes     64 bytes
-------------------------------------------------------------
               dirty       32,000       22,000 (86 cyc/line)
 16 KB
               clean       26,000       12,800 (50 cyc/line)
-------------------------------------------------------------
               dirty      130,000       85,000 (83 cyc/line)
 64 KB
               clean      105,000       54,000 (52 cyc/line)
-------------------------------------------------------------

While all the numbers are substantially lower than what you're seeing,
clearly using a 64-byte stride is a big win.  I assume the difference
between our results is due to chipsets.  My measurements were done
with a 1.5GHz/6M Madison and the zx1 chipset, which doesn't go beyond
4-way (hence latency tends to be substantially better than with more
scalable chipsets).

        --david
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to