Hi Majid, Are you taking into account the instruction fetches?
Cheers, Jason On Thu, Feb 20, 2020 at 9:53 AM Majid Jalili <majid...@gmail.com> wrote: > Let me correct myself. If I set the Size to 5K, then there would be total > of 10K loads (for a[i] and b[i]), so i expect to see 10K/8=1250. > > On Thu, Feb 20, 2020 at 11:45 AM Majid Jalili <majid...@gmail.com> wrote: > >> I am running a simple stream benchmark that does a simple addition: >> m5_reset_stats(0,0); >> for(int i = 0 ; i <Size; i++) >> c[i] =a[i]+b[i]; >> m5_dump_stats(0,0); >> >> Each element of these arrays is a uint64_t. I turned off prefetchers and >> only enabled one level of cache. When I run for size of 10K elements, >> since 8 uint_64 elements can be fit onto a block, I expect to have at most >> 10K/8=1250 reads from main memory. However, if I use LRU RP at L1, I see >> 1792 reads at main memory. If the RP changes to RRRIP, then it would be >> 1340 reads. >> >> I cannot figure out why LRU is doing poorly, while it should be way >> better. In terms of numCycles, also LRU is slower than RRRIP? >> >> Majid >> >> _______________________________________________ > gem5-users mailing list > gem5-users@gem5.org > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users