Hi all,

I have no specific knowledge on what are the buffers modeling or what they should be modeling, but I too have encountered this issue some time ago. Setting a high wbDepth is what I do to work around it (actually, 3 is sufficient for me), because performance is indeed suffering quite a lot (and even more for narrow-issue cores if wbWidth == issueWidth, I would expect) in some cases.

Le 12/05/2014 19:39, Steve Reinhardt via gem5-users a écrit :
Hi Paul,

I assume you're talking about the 'wbMax' variable? I don't recall it specifically myself, but after looking at the code a bit, the best I can come up with is that there's assumed to be a finite number of buffers somewhere that hold results from the function units before they write back to the reg file. Realistically, to me, it seems like those buffers would be distributed among the function units anyway, not a global resource, so having a global limit doesn't make a lot of sense. Does anyone else out there agree or disagree?

It doesn't seem to relate to any structure that's directly modeled in the code, i.e., I think you could rip the whole thing out (incrWb(), decrWb(), wbOustanding, wbMax) without breaking anything in the model... which would be a good thing if in fact everyone else is either suffering unaware or just working around it by setting a large value for wbDepth.

That said, we've done some internal performance correlation work, and I don't recall this being an issue, for whatever that's worth. I know ARM has done some correlation work too; have you run into this?

Steve



On Fri, May 9, 2014 at 7:45 AM, Paul V. Gratz via gem5-users <gem5-users@gem5.org <mailto:gem5-users@gem5.org>> wrote:

    Hi All,
    Doing some digging on performance issues in the O3 model we and
    others have run into allocation of the writeback buffer having a
    big performance impact.  Basically, the a writeback buffer is
    grabbed at issue time and held through till completion.  With
    default assumptions about the number of available writeback
    buffers, (x*issue width, where x is 1 by default), the buffers
    often end up bottlenecking the effective issue width (particularly
    in the face of long latency loads grabbing up all the WB buffers).
     What are these structures trying to model?  I can see limiting
    the number of instructions allowed to complete and
    writeback/bypass in a cycle but this ends up being much more
    conservative than that if it is the intent.  If not why does it do
    this?  We can easily make number of WB bufs high but want to
    understand what is going on here first...
    Thanks!
    Paul

-- -----------------------------------------
    Paul V. Gratz
    Assistant Professor
    ECE Dept, Texas A&M University
    Office: 333M WERC
    Phone: 979-488-4551 <tel:979-488-4551>
    http://cesg.tamu.edu/faculty/paul-gratz/

    _______________________________________________
    gem5-users mailing list
    gem5-users@gem5.org <mailto:gem5-users@gem5.org>
    http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users




_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users


--
Arthur Perais
INRIA Bretagne Atlantique
Bâtiment 12E, Bureau E303, Campus de Beaulieu
35042 Rennes, France

_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to