Oh, I'm sorry, I've just seen that the default config for x86 does really not have a prefetcher set. I somehow assumed that it has :/
I've now set a prefetcher for the L2 cache as on ARM: prefetcher = StridePrefetcher(degree=8, latency = 1) With that, I achieve a bandwidth of 1702 MiB/s. So, that's at least a lot better than before, although I'm still quite a bit away from 4352 MiB/s. Shouldn't it be much better than that? Best regards, Nils On 05/18/2016 10:38 AM, Nils Asmussen wrote: > Hi, > > does anybody know why Linux cannot saturate the DRAM bandwidth with an > O3 core? Or knows how I can track the problem down? > > Best regards, > Nils > > > On 05/11/2016 10:20 AM, Nils Asmussen wrote: >> Hi again, >> >> I've now played around a bit. First, I noticed that I should better do >> the resetstats and dumpstats in my benchmark program directly before and >> after the file reading via pseudo instructions. Instead of using the m5 >> program as I did before. This decreases the difference a bit, but the >> effect is still there. >> >> Using the default x86 config as last time, I achieve now 1079 MiB/s on >> Linux. With my device, I achieve 4352 MiB/s. >> >> Then I have copied the parameters from O3_ARM_v7a_3 (except fuPool, >> because I don't know whether that's a good idea) to a new subclass of >> DerivO3CPU. With that, I achieve a bandwidth of 608 MiB/s. >> >> Finally, I set LQEntries and SQEntries to 128 (otherwise, it's the >> default DerivO3CPU) to hopefully increase the number of prefetched cache >> lines. But does even decrease the bandwidth slightly to 1034 MiB/s. >> >> Is there something else I need to do to improve the prefetching? >> >> I have also uploaded the stats.txt files from Linux on the default x86 >> config and the one from the system with my device, if you want to take a >> look: >> Linux: https://gist.github.com/Nils-TUD/18c614553463fbd2fa6df74fd31440b4 >> Dev: https://gist.github.com/Nils-TUD/058fb8e8de4981b5b04d4389c8aef41e >> >> In the latter case, the DRAM controller sits in pe8, so you can find the >> stats at the very bottom. >> >> Best regards, >> Nils >> >> >> >> On 05/10/2016 03:56 PM, Nils Asmussen wrote: >>> Hi Andreas, >>> >>> thanks for the quick response. >>> >>> Doing the experiment on ARM would be a bit of effort. Can't I simply >>> tune the parameters of the O3 CPU like ARM does, i.e., copy them from >>> configs/O3_ARM_v7a.py? >>> >>> What do you mean with "add prefetches to the cache configs"? >>> >>> Best regards, >>> Nils >>> >>> >>> On 05/10/2016 03:39 PM, Andreas Hansson wrote: >>>> Hi Nils, >>>> >>>> I suspect this is all down to prefetching, or lack thereof. I would >>>> suggest to try your experiment with build/ARM/gem5.opt and the >>>> arm_detailed CPU (or alternatively add prefetches to the cache configs you >>>> are using at the moment). >>>> >>>> Andreas >>>> >>>> On 10/05/2016, 14:36, "gem5-users on behalf of Nils Asmussen" >>>> <[email protected] on behalf of [email protected]> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I'm running Linux 3.18 on a single-core x86 system using the O3 model. >>>>> The command line is: >>>>> ./build/X86/gem5.opt configs/example/fs.py --cpu-type detailed >>>>> --cpu-clock=1GHz --sys-clock=1GHz --caches --l2cache >>>>> --command-line="ttyS0 noapictimer console=ttyS0 lpj=7999923 >>>>> root=/dev/sda1" >>>>> >>>>> On Linux, I'm executing a self-written benchmark, which reads a 2 MiB >>>>> file using read system calls. That means, in the end, the kernel is >>>>> doing a memcpy in the kernel to copy that file into the user buffer. >>>>> Looking at stats.txt (only measuring the benchmark itself), I see 274 >>>>> MiB/s at the DRAM controller. >>>>> >>>>> In my project, I developed a device, which can be used to e.g. >>>>> load data from the DRAM. I run a similar benchmark on my system that >>>>> reads a 2 MiB file from the DRAM using that device. In this case, I'm >>>>> seeing 3 GiB/s at the DRAM controller. >>>>> >>>>> The main difference is that my device fetches 1 KiB at once from the >>>>> DRAM, while the memcpy loads it cacheline by cacheline, i.e. 64 bytes at >>>>> once. >>>>> >>>>> Is that expected behaviour or am I doing something wrong? >>>>> >>>>> Let me know if you need more information. >>>>> >>>>> Best regards, >>>>> Nils >>>>> >>>> >>>> IMPORTANT NOTICE: The contents of this email and any attachments are >>>> confidential and may also be privileged. If you are not the intended >>>> recipient, please notify the sender immediately and do not disclose the >>>> contents to any other person, use it for any purpose, or store or copy the >>>> information in any medium. Thank you. >>>> _______________________________________________ >>>> gem5-users mailing list >>>> [email protected] >>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>>> >>> >>> >>> >>> >>> _______________________________________________ >>> gem5-users mailing list >>> [email protected] >>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>> >> >> >> >> >> _______________________________________________ >> gem5-users mailing list >> [email protected] >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> > > > > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >
signature.asc
Description: OpenPGP digital signature
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
