Hi all,
I have no specific knowledge on what are the buffers modeling or what
they should be modeling, but I too have encountered this issue some time
ago. Setting a high wbDepth is what I do to work around it (actually, 3
is sufficient for me), because performance is indeed suffering quite a
lot (and even more for narrow-issue cores if wbWidth == issueWidth, I
would expect) in some cases.
Le 12/05/2014 19:39, Steve Reinhardt via gem5-users a écrit :
Hi Paul,
I assume you're talking about the 'wbMax' variable? I don't recall it
specifically myself, but after looking at the code a bit, the best I
can come up with is that there's assumed to be a finite number of
buffers somewhere that hold results from the function units before
they write back to the reg file. Realistically, to me, it seems like
those buffers would be distributed among the function units anyway,
not a global resource, so having a global limit doesn't make a lot of
sense. Does anyone else out there agree or disagree?
It doesn't seem to relate to any structure that's directly modeled in
the code, i.e., I think you could rip the whole thing out (incrWb(),
decrWb(), wbOustanding, wbMax) without breaking anything in the
model... which would be a good thing if in fact everyone else is
either suffering unaware or just working around it by setting a large
value for wbDepth.
That said, we've done some internal performance correlation work, and
I don't recall this being an issue, for whatever that's worth. I know
ARM has done some correlation work too; have you run into this?
Steve
On Fri, May 9, 2014 at 7:45 AM, Paul V. Gratz via gem5-users
<gem5-users@gem5.org <mailto:gem5-users@gem5.org>> wrote:
Hi All,
Doing some digging on performance issues in the O3 model we and
others have run into allocation of the writeback buffer having a
big performance impact. Basically, the a writeback buffer is
grabbed at issue time and held through till completion. With
default assumptions about the number of available writeback
buffers, (x*issue width, where x is 1 by default), the buffers
often end up bottlenecking the effective issue width (particularly
in the face of long latency loads grabbing up all the WB buffers).
What are these structures trying to model? I can see limiting
the number of instructions allowed to complete and
writeback/bypass in a cycle but this ends up being much more
conservative than that if it is the intent. If not why does it do
this? We can easily make number of WB bufs high but want to
understand what is going on here first...
Thanks!
Paul
--
-----------------------------------------
Paul V. Gratz
Assistant Professor
ECE Dept, Texas A&M University
Office: 333M WERC
Phone: 979-488-4551 <tel:979-488-4551>
http://cesg.tamu.edu/faculty/paul-gratz/
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org <mailto:gem5-users@gem5.org>
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Arthur Perais
INRIA Bretagne Atlantique
Bâtiment 12E, Bureau E303, Campus de Beaulieu
35042 Rennes, France
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users