> On May 14, 2015, 7:33 p.m., Joel Hestness wrote:
> > From experience with the O3 CPU, this is a VERY important change for 
> > simulated CPU performance. I appreciate the effort to finally fix this.
> > 
> > It would be nice for the Ruby and gem5-classic memory hierarchies to 
> > provide the same access interface, but I think the consistency implications 
> > of this patch need to be discussed.
> > 
> > I'm worried that this patch seems likely to upset consistency models for 
> > cores that may have relied on Ruby to block aliased memory accesses. 
> > Specifically, if a core was blocking multiple outstanding accesses to a 
> > single line as a way to enforce consistency to data in that line (e.g. 
> > TSO), but now the accesses could be concurrently issued to Ruby, seems like 
> > it would now be the responsibility of the sequencer and maybe even the 
> > coherence protocol to ensure that those accesses remain ordered as required.
> > 
> > Given the behavior of the O3 CPU, perhaps the classic memory hierarchy 
> > allows multiple outstanding accesses to a single line. However, it handles 
> > transient coherence states with atomic coherence updates, which make it 
> > much easier to guarantee access ordering to a single line, so I'm not clear 
> > that it exposes the same interface as this patch provides.
> > 
> > Are you sure that all Ruby-working CPU cores and existing protocols still 
> > enforce correct consistency?
> 
> Brad Beckmann wrote:
>     I'm glad you appreciate the importance of this patch.  I'm not deeply 
> familar with the Classic model but I'm pretty sure it does not allow a CPU to 
> issue multiple outstanding access to a single cache line.
>     
>     What is the complete set of Ruby-working CPU cores and protocols?  What 
> tests does one use to ensure correct consistency?  We can certainly assert 
> for the correctness of x86.  I believe that all existing Ruby protocols 
> implement coherence and the evictionCallbacks the same way, so I'm not 
> expecting any differences.  What makes you think there may be an issue with a 
> certain combination of cores and protocols?
> 
> Joel Hestness wrote:
>     If, in fact, the classic memory model blocks concurrent accesses to the 
> same line, does the O3CPU also suffer from O3 pipeline squashes there? 
> According to the status matrix http://gem5.org/Status_Matrix, this could be 
> tested, since you don't need to use atomic memory accesses for this. If the 
> classic memory model suffers from this issue, it doesn't sound like O3 
> pipeline flushing should be fixed in Ruby, but instead by fixing the O3 CPU. 
> Why aren't we doing that instead?
>     
>     On the other hand, if the classic hierarchy doesn't suffer from pipeline 
> squashes, I'd prefer some exposition about why it differs and why we can't 
> implement Ruby's blocking the same way (read: this patch's description and 
> code comments leave a ton to be desired). A common cache interface between 
> the two memory models is very desirable to allow some certainty that cores 
> should work with both. We need to know if this patch makes classic and Ruby 
> cache interfaces differ.
>     
>     If the classic memory hierarchy does allow multiple outstanding requests 
> to the same line, then there are possible consistency implications to be 
> explored. Here is a simple example (specifically with MOESI_CMP_directory) 
> that might suffer from broken TSO with this patch:
>      1) A core issues store 1 to cache line A
>      2) A core tries to issue store 2 to cache line A. Without this patch, 
> Ruby would block, which can help the LSQ preserve TSO. With this patch, Ruby 
> now accepts store 2, and both store 1 and 2 end up in the L1 controller's 
> mandatory queue.
>      3) Suppose the cache line is in a transient state being grabbed from the 
> cache so that store 1 gets recycled in the mandatory queue (which is 
> unordered).
>     Now store 2 is effectively ahead of store 1 in the cache controller, and 
> their order of visibility to the rest of the system could be upset. This is a 
> simple example, which witnesses the issue pretty easily due to recycling the 
> store queue. However, other protocols must currently assume a single 
> outstanding access per line at a time (otherwise the RubyPort would allow 
> them), so they do not enforce ordering if it is required.
>     
>     For tests, consistency litmus. For working systems, we should start with 
> those working in the current status matrix. Unfortunately, the matrix appears 
> to be out of date, because I know that x86 O3CPU works with at least 
> MOESI_hammer in both SE and FS mode. We currently depend on the O3CPU+Ruby to 
> enforce correct x86 consistency for our heterogeneous processor tests, so I'd 
> prefer to be assured this change won't break that. It seems likely that other 
> configurations should also be tested (e.g. Marco just posted a review 
> suggesting he is testing consistency of MESI_Two_Level and O3: 
> http://reviews.gem5.org/r/2840/).

UPDATE: Apologies, I see now that the Sequencer does not actually issue aliased 
requests, but instead buffers them in the sequencer (the subject of your thread 
with Nilay). The access ordering in cache controllers is not of concern, then.

I've also inspected the O3 CPU's LSQ, and it appears to block itself from 
issuing multiple stores when enforcing TSO. x86 O3 CPU should be fine with this 
change.


I'd still like us to discuss the cache interface differences between classic 
and Ruby.


- Joel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2787/#review6262
-----------------------------------------------------------


On May 11, 2015, 10:22 p.m., Tony Gutierrez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2787/
> -----------------------------------------------------------
> 
> (Updated May 11, 2015, 10:22 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> Changeset 10844:0848038fe1d8
> ---------------------------
> ruby: Fixed pipeline squashes caused by aliased requests
> 
> This patch was created by Bihn Pham during his internship at AMD.
> 
> This patch fixes a very significant performance bug when using the O3 CPU 
> model
> and Ruby. The issue was Ruby returned false when it received a request to the
> same address that already has an outstanding request or when the memory is
> blocked. As a result, O3 unnecessary squashed the pipeline and re-executed
> instructions. This fix merges readRequestTable and writeRequestTable in
> Sequencer into a single request table that keeps track of all requests and
> allows multiple outstanding requests to the same address. This prevents O3
> from squashing the pipeline.
> 
> 
> Diffs
> -----
> 
>   src/mem/ruby/system/Sequencer.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
>   src/mem/ruby/system/Sequencer.cc fbdaa08aaa426b9f4660c366f934ccb670d954ec 
> 
> Diff: http://reviews.gem5.org/r/2787/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Tony Gutierrez
> 
>

_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to