I am running a simple stream benchmark that does a simple addition:
 m5_reset_stats(0,0);
 for(int i = 0 ; i <Size; i++)
        c[i] =a[i]+b[i];
m5_dump_stats(0,0);

Each element of these arrays is a uint64_t. I turned off prefetchers and
only enabled one level of cache. When I run for size of 10K elements,
since 8 uint_64 elements can be fit onto a block, I expect to have at most
10K/8=1250 reads from  main memory. However, if I use LRU RP at L1, I see
1792 reads at main memory. If the RP changes to RRRIP, then it would be
1340 reads.

I cannot figure out why LRU is doing poorly, while it should be way better.
In terms of numCycles, also LRU is slower than RRRIP?

Majid
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to