Hi all,
In the O3 LSQ there is a variable called "cachePorts" which controls the
number of stores that can be made each cycle (see lines 790-795 in
lsq_unit_impl.hh).
cachePorts defaults to 200 (see O3CPU.py), so in practice, there is no
limit on the number of stores that are written back to the D-Cache, and
everything works out fine.
Now, silly me wanted to be a bit more realistic and set cachePorts to
one, so that I could issue one store per cycle to the D-Cache only.
In a few SPEC programs, this caused the SQFullEvent to get very high,
which I assumed was reasonable because well, less stores per cycle.
However, after looking into it, I found that the variable "usedPorts"
(which allows stores to WB only if it is < to "cachePorts") is increased
by stores when they WB (which is fine), but also by */load/*s when they
access the D-Cache (see lines 768 and 814 in lsq_unit.hh). However, the
number of loads that can access the D-Cache each cycle is controlled by
the number of load functional units, and not at all by "cachePorts".
This means that if I set cachePorts to 1, and I have 2 load FUs, I can
do 2 loads per cycle, but as soon as I do one load, then I cannot
writeback any store this cycle (because "usePorts" will already be 1 or
2 when gem5 enters writebackStores() in lsq_unit_impl.hh). On the other
hand, if I set cachePorts to 3 I can do 2 loads and one store per cycle,
but I can also WB three stores in a single cycle, which is not what I
wanted to be able to do.
This should be addressed by not increasing "usedPorts" when loads access
the D-Cache and being explicit about what variable constrains what
(i.e., loads are constrained by load FUs and stores by "cachePorts"), or
by also contraining loads on "cachePorts" (which will be hard since
arbitration would potentially be needed between loads and stores, and
since store WBs happen after load accesses in gem5, this can get messy).
As of now, this is a bit of both, and performance looks fine at first,
but it's really not.
I can write a small patch for the first solution (don't increase
"usedPorts" on load accesses), but I am not sure this corresponds to the
philosophy of the code. What do you think would be the best course of
action?
Best,
Arthur.
--
Arthur Perais
INRIA Bretagne Atlantique
Bâtiment 12E, Bureau E303, Campus de Beaulieu
35042 Rennes, France
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users