Re: [gem5-dev] Review Request 2787: ruby: Fixed pipeline squashes caused by aliased requests

Nilay Vaish Wed, 02 Sep 2015 18:59:11 -0700


> On Aug. 30, 2015, 6:01 p.m., Joel Hestness wrote:
> > Hi guys,
> > 
> > I'm not sure of the status of or plans for this patch, but I wanted to test 
> > it out and provide some feedback. I've merged and tested this with the 
> > current gem5 head (11061). First, there are a number of things broken with 
> > this patch, and if we're still interested in checking it in, it needs 
> > plenty of work. It took a fair amount of debugging before I was able to run 
> > with it.
> > 
> > Second, after merging, I microbenchmarked a few common memory access 
> > patterns. The O3 CPU with Ruby certainly performs better than older 
> > versions of gem5 (I was testing changeset 10238). It appears that prior to 
> > this patch, the O3 CPU has been modified to fix the memory access squashing 
> > caused by Ruby sequencer blocking (not sure which changes fixed that), so 
> > the execution time of the microbenchmarks is now comparable between Ruby 
> > and classic without this patch.
> > 
> > Further, I found this patch actually introduces many confusing issues and 
> > can reduce performance by up to 60%. It was very difficult to track down 
> > why performance suffered: By coalescing requests in the sequencer, the 
> > number of cache accesses changes, so the first problem was figuring out 
> > what an 'appropriate' change in number of cache accesses might be. After 
> > coming to an ok conclusion on that, I then found that the sequencer 
> > max_outstanding_requests needs to be configured appropriately to manage 
> > sequencer coalescing well. Specifically, if the max_outstanding_requests is 
> > less than the LSQ depth, the sequencer won't block the LSQ from issuing 
> > accesses to the same line, but will block when it is full. This reduces the 
> > MLP exposed to the caches compared to when the LSQ is blocked on 
> > outstanding lines and forced to expose accesses to separate lines. Setting 
> > the max_outstanding_requests greater than the LSQ depth fixes this, but 
> > this performance bug indicates that the coalescing in the sequencer 
> > introduces more non-trivial configuration to get reasonable MLP.
> > 
> > Overall, I feel that this patch needs to be dropped: It does not appear to 
> > be necessary for performance, and it actually introduces problems with 
> > performance and debugging due to the cache access effects.
> 
> Brad Beckmann wrote:
>     We still hope to check it in.  The last time we ran our tests, this patch 
> proved to be very important for O3 performance.  Even if the squashing 
> behavior is fixed, I think we saw benefits from avoiding unecessary mandatory 
> queue latency for aliased requests.  We will rerun our tests to confirm.
>     
>     I'm surprised you encountered issues with the latest version of the 
> patch.  I personally put a lot of effort into adding the disable feature that 
> you requested and I tested the enable/disable bit with the CPU models and 
> Ruby.  It worked for me.
>     
>     The issue you point out with max_outstanding_requests seems to be a 
> problem with its prior default value.  Why do you think increasing the value 
> is a "non-trivial configuration"?  Shouldn't we have max_outstanding_requests 
> equal to the LSQ size?


I am acutally in favor of the idea of coalescing requests.  This is something 
that classic memory does as well.
And I read some paper on MSHRs maintaining primary and secondary requests.  I 
would like that we ultimately 
merge this patch.  I think we should split this patch into two parts:

a. the first patch will combine the read and the write request tables 
maintained by the sequencer.  I think Joel
has no objections to that part.  So should be possible to merge that patch any 
time.

b. the second part will provide each entry in the combined request table with a 
list of packets instead of a
single packet.  This part of the patch is what we need to discuss.

  -- I think we should drop the max outstanding request variable all together.  
As both of you agree 
     the LSQ size should decide how many requests should be pending, there is 
no need for the sequencer to 
     maintain its own limit.
     
  -- We also need to decide on how the statistical variables are updated.  With 
this patch, the requests buffered
     in the sequencer will not update the stats maintained by the cache 
controller.  This becomes a problem
     when one tries to compare the loads and stores issued by the cpu and the 
loads and stores seen by the cache
     controller.


- Nilay


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2787/#review7077
-----------------------------------------------------------


On July 22, 2015, 6:15 p.m., Tony Gutierrez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2787/
> -----------------------------------------------------------
> 
> (Updated July 22, 2015, 6:15 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> Changeset 10887:1e05089bc991
> ---------------------------
> ruby: Fixed pipeline squashes caused by aliased requests
> 
> This patch was created by Bihn Pham during his internship at AMD.
> 
> This patch fixes a very significant performance bug when using the O3 CPU 
> model
> and Ruby. The issue was Ruby returned false when it received a request to the
> same address that already has an outstanding request or when the memory is
> blocked. As a result, O3 unnecessary squashed the pipeline and re-executed
> instructions. This fix merges readRequestTable and writeRequestTable in
> Sequencer into a single request table that keeps track of all requests and
> allows multiple outstanding requests to the same address. This prevents O3
> from squashing the pipeline.
> 
> 
> Diffs
> -----
> 
>   src/mem/ruby/system/Sequencer.hh ebb3d0737aa72ec4fa24b6af9cf9a6b2a1109d18 
>   src/mem/ruby/system/Sequencer.cc ebb3d0737aa72ec4fa24b6af9cf9a6b2a1109d18 
>   src/mem/ruby/system/Sequencer.py ebb3d0737aa72ec4fa24b6af9cf9a6b2a1109d18 
> 
> Diff: http://reviews.gem5.org/r/2787/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Tony Gutierrez
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Review Request 2787: ruby: Fixed pipeline squashes caused by aliased requests

Reply via email to