-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3773/#review9281
-----------------------------------------------------------


Overall this patch looks really good.  I'm sure it helps out GPU simulations 
quite a bit.  I do have a few questions/comments I would like 
answered/addressed before I give it a ship it.


src/mem/ruby/network/simple/PerfectSwitch.hh (line 117)
<http://reviews.gem5.org/r/3773/#comment7930>

    In your comment, please explain why this is a three dimensional vector, 
rather than just a two dimensional one vnet x input port.  Based on the current 
comment, I would have thought you only had to maintain this bit vector for each 
vnet's input port, rather than the vnet input/output combination.



src/mem/ruby/network/simple/PerfectSwitch.cc (line 143)
<http://reviews.gem5.org/r/3773/#comment7928>

    Minor question, but wouldn't a 'return' be more appropriate than a 'break'?



src/mem/ruby/network/simple/PerfectSwitch.cc (line 243)
<http://reviews.gem5.org/r/3773/#comment7929>

    Is it possible to pull this loop into a separate function?  This is quite a 
complicated, long while loop.  It would be nice to break it up and make it more 
readable.


- Brad Beckmann


On Dec. 23, 2016, 3:09 p.m., Joel Hestness wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/3773/
> -----------------------------------------------------------
> 
> (Updated Dec. 23, 2016, 3:09 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> Changeset 11786:93f0e3b78f2d
> ---------------------------
> ruby: PerfectSwitch add assured access arbitration
> 
> When operating near bandwidth saturation and using finite cache hierarchy
> buffering, the round-robin arbitration in the PerfectSwitch caused low ID
> input buffers to gain access to the switch more frequently than other input
> buffers that might contain requests. This resulted from the priority cycling
> starting on input buffers with no pending requests and cycling around to the
> low ID buffers with pending requests. Part of the problem was that
> input-to-output port allocation was done on-the-fly while cycling through
> input ports.
> 
> To fix this, refactor the PerfectSwitch to remove on-the-fly arbitration, and
> better delineate port allocation from switch traversal. Then, implement
> cycling-priority assured access arbitration using output port request batches
> to ensure that all input ports are given the same priority when buffers are
> full.
> 
> This fix reduces GPU core progress asymmetry from >3x down to <12%, and in
> line with hardware.
> 
> 
> Diffs
> -----
> 
>   src/mem/ruby/network/simple/PerfectSwitch.cc 6dc9ab9b2294 
>   src/mem/ruby/network/simple/PerfectSwitch.hh 6dc9ab9b2294 
> 
> Diff: http://reviews.gem5.org/r/3773/diff/
> 
> 
> Testing
> -------
> 
> Extensive testing and use in gem5-gpu. Used GPU to saturate cache hierarchy
> bandwidth, and tracked threadblock progress to witness asymmetry. Repeated
> this testing after the fix to see greatly reduced asymmetry. Also, in these
> small tests, simulator run time improves slightly due to reduced amount of
> work performed by PerfectSwitch arbitration. Also, have run thousands of
> simulations with this patch to verify that the changes work for a wide
> range of simulated system behaviors.
> 
> 
> Thanks,
> 
> Joel Hestness
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to