Hi Majid,

Are you taking into account the instruction fetches?

Cheers,
Jason

On Thu, Feb 20, 2020 at 9:53 AM Majid Jalili <majid...@gmail.com> wrote:

> Let me correct myself. If I set the Size to 5K, then there would be total
> of 10K loads (for a[i] and b[i]), so i expect to see 10K/8=1250.
>
> On Thu, Feb 20, 2020 at 11:45 AM Majid Jalili <majid...@gmail.com> wrote:
>
>> I am running a simple stream benchmark that does a simple addition:
>>  m5_reset_stats(0,0);
>>  for(int i = 0 ; i <Size; i++)
>>         c[i] =a[i]+b[i];
>> m5_dump_stats(0,0);
>>
>> Each element of these arrays is a uint64_t. I turned off prefetchers and
>> only enabled one level of cache. When I run for size of 10K elements,
>> since 8 uint_64 elements can be fit onto a block, I expect to have at most
>> 10K/8=1250 reads from  main memory. However, if I use LRU RP at L1, I see
>> 1792 reads at main memory. If the RP changes to RRRIP, then it would be
>> 1340 reads.
>>
>> I cannot figure out why LRU is doing poorly, while it should be way
>> better. In terms of numCycles, also LRU is slower than RRRIP?
>>
>> Majid
>>
>> _______________________________________________
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to