Hi Arthur,

I agree with your observations, but it would be good if someone more familiar 
with the o3 model could chime in.

Andreas

From: gem5-users 
<gem5-users-boun...@gem5.org<mailto:gem5-users-boun...@gem5.org>> on behalf of 
Arthur Perais <arthur.per...@inria.fr<mailto:arthur.per...@inria.fr>>
Reply-To: gem5 users mailing list 
<gem5-users@gem5.org<mailto:gem5-users@gem5.org>>
Date: Tuesday, 19 April 2016 at 10:41
To: "gem5-users@gem5.org<mailto:gem5-users@gem5.org>" 
<gem5-users@gem5.org<mailto:gem5-users@gem5.org>>
Subject: [gem5-users] o3cpu: cache ports

Hi all,

In the O3 LSQ there is a variable called "cachePorts" which controls the number 
of stores that can be made each cycle (see lines 790-795 in lsq_unit_impl.hh).
cachePorts defaults to 200 (see O3CPU.py), so in practice, there is no limit on 
the number of stores that are written back to the D-Cache, and everything works 
out fine.

Now, silly me wanted to be a bit more realistic and set cachePorts to one, so 
that I could issue one store per cycle to the D-Cache only.
In a few SPEC programs, this caused the SQFullEvent to get very high, which I 
assumed was reasonable because well, less stores per cycle. However, after 
looking into it, I found that the variable "usedPorts" (which allows stores to 
WB only if it is < to "cachePorts") is increased by stores when they WB (which 
is fine), but also by loads when they access the D-Cache (see lines 768 and 814 
in lsq_unit.hh). However, the number of loads that can access the D-Cache each 
cycle is controlled by the number of load functional units, and not at all by 
"cachePorts".

This means that if I set cachePorts to 1, and I have 2 load FUs, I can do 2 
loads per cycle, but as soon as I do one load, then I cannot writeback any 
store this cycle (because "usePorts" will already be 1 or 2 when gem5 enters 
writebackStores() in lsq_unit_impl.hh). On the other hand, if I set cachePorts 
to 3 I can do 2 loads and one store per cycle, but I can also WB three stores 
in a single cycle, which is not what I wanted to be able to do.

This should be addressed by not increasing "usedPorts" when loads access the 
D-Cache and being explicit about what variable constrains what (i.e., loads are 
constrained by load FUs and stores by "cachePorts"), or by also contraining 
loads on "cachePorts" (which will be hard since arbitration would potentially 
be needed between loads and stores, and since store WBs happen after load 
accesses in gem5, this can get messy). As of now, this is a bit of both, and 
performance looks fine at first, but it's really not.

I can write a small patch for the first solution (don't increase "usedPorts" on 
load accesses), but I am not sure this corresponds to the philosophy of the 
code. What do you think would be the best course of action?

Best,

Arthur.

--
Arthur Perais
INRIA Bretagne Atlantique
Bâtiment 12E, Bureau E303, Campus de Beaulieu
35042 Rennes, France

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to