I am running a simple stream benchmark that does a simple addition: m5_reset_stats(0,0); for(int i = 0 ; i <Size; i++) c[i] =a[i]+b[i]; m5_dump_stats(0,0);
Each element of these arrays is a uint64_t. I turned off prefetchers and only enabled one level of cache. When I run for size of 10K elements, since 8 uint_64 elements can be fit onto a block, I expect to have at most 10K/8=1250 reads from main memory. However, if I use LRU RP at L1, I see 1792 reads at main memory. If the RP changes to RRRIP, then it would be 1340 reads. I cannot figure out why LRU is doing poorly, while it should be way better. In terms of numCycles, also LRU is slower than RRRIP? Majid
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users