----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2787/#review6383 -----------------------------------------------------------
I was buffering these comments, because this change has large implications and we still haven't discussed the differences between Ruby and classic memory interfaces. The patch submitter should be able to address questions like the ones I've raised so far (I don't feel any clarification has been provided yet). Once again, I did more digging: It appears that classic caches completely model MSHR queuing and write buffering, and they block requests from the core when these are full. This indicates that they accept multiple concurrent accesses to a single line (it would be nice to get confirmation on that from a classic user). Thus, it makes sense to allow the Ruby sequencer to accept multiple outstanding accesses to a single line concurrently to be consistent with classic. Overall, I have significant complaints about this patch, because it introduces buffering into the sequencer, which has the effect of implementing MSHR queuing that should probably be in the caches to be consistent with classic. This breaks the sequencer's thin shim abstraction between cores and cache controllers, appears to break no-RFO L1 caches, and side-steps cache resource constraints. Detailed comments are below. src/mem/ruby/system/Sequencer.hh (line 187) <http://reviews.gem5.org/r/2787/#comment5490> Minor: This declaration introduces yet another member variable naming convention (m_UpperCamel). Can you please change the name of this table so it at least matches some other variables? m_lowerCamel looks most common, and gem5's style guideline suggests it should be lowerCamel (i.e. without 'm_'). (I understand a lot of Ruby code is bad about this, but we can keep from making it worse) src/mem/ruby/system/Sequencer.cc (line 89) <http://reviews.gem5.org/r/2787/#comment5491> Minor: Please follow the local_variable naming convention that is already used in this function. These should be aliased_it and aliased_it_end src/mem/ruby/system/Sequencer.cc (line 154) <http://reviews.gem5.org/r/2787/#comment5492> Minor: Ibid src/mem/ruby/system/Sequencer.cc (line 221) <http://reviews.gem5.org/r/2787/#comment5519> Can you please clarify, since it hasn't been stated explicitly yet: Current L0/L1 cache controllers cannot handle multiple outstanding accesses to a single line? If so, can you elaborate on why? What breaks? src/mem/ruby/system/Sequencer.cc (line 315) <http://reviews.gem5.org/r/2787/#comment5520> This is a symptom of never sending the aliased request to the controller (at which point it's initialRequestTime would have been set appropriately). I feel that all requests to the Sequencer need to be sent through to the controller, which would fix this. src/mem/ruby/system/Sequencer.cc (line 428) <http://reviews.gem5.org/r/2787/#comment5522> The Sequencer should not be handling stores in this manner. This effectively coalesces and completes concurrent stores to a single line (in zero time) without their data ever being sent to the cache. In other words this effectively hoists MSHRs either up from the top-level cache controller (or pulls them down from the LSQ?). By doing this, repeated stores are never subject to resource constraints like cache porting or available MSHRs in the controller, and cache/MSHR access stats are incorrect. You need to send requests through to the cache controller. src/mem/ruby/system/Sequencer.cc (line 436) <http://reviews.gem5.org/r/2787/#comment5523> This breaks the no-RFO property of some L1s (and is particularly relevant for GPUs). For instance, this would fail with gem5-gpu's VI_hammer protocol, because the cache line is not returned on writes. You can't assume that the current data will be available when you've queued loads behind a store to a line. This is a major signal that MSHR queuing should not happen in the sequencer (which must be protocol-independent). You need to send requests through to the cache controller. src/mem/ruby/system/Sequencer.cc (line 495) <http://reviews.gem5.org/r/2787/#comment5521> There is a similar issue for decoalescing loads here in the Sequencer (also in zero time!). Specifically, by queuing accesses to a line in the Sequencer, the cache is never made aware of the data in the line that is needed for each of the queued accesses. This means that it does not know what portion of the line should be returned to the core, and this again dodges potential resource constraints like the amount of data that can be pulled from the line per cycle to respond to each outstanding access. Can the data from a cache line really be fanned out to multiple separate requests in the same cycle when queuing MSHRs (note: with strided cache access from O3 CPU, this could be 16, 32 or more concurrent accesses in a single cycle)? You really need to send requests through to the cache controller. - Joel Hestness On May 26, 2015, 7:42 p.m., Tony Gutierrez wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://reviews.gem5.org/r/2787/ > ----------------------------------------------------------- > > (Updated May 26, 2015, 7:42 p.m.) > > > Review request for Default. > > > Repository: gem5 > > > Description > ------- > > Changeset 10817:d010e6a8e783 > --------------------------- > ruby: Fixed pipeline squashes caused by aliased requests > > This patch was created by Bihn Pham during his internship at AMD. > > This patch fixes a very significant performance bug when using the O3 CPU > model > and Ruby. The issue was Ruby returned false when it received a request to the > same address that already has an outstanding request or when the memory is > blocked. As a result, O3 unnecessary squashed the pipeline and re-executed > instructions. This fix merges readRequestTable and writeRequestTable in > Sequencer into a single request table that keeps track of all requests and > allows multiple outstanding requests to the same address. This prevents O3 > from squashing the pipeline. > > > Diffs > ----- > > src/mem/ruby/system/Sequencer.hh df2aa91dba5b0f0baa351039f0802baad9ed8f1d > src/mem/ruby/system/Sequencer.cc df2aa91dba5b0f0baa351039f0802baad9ed8f1d > > Diff: http://reviews.gem5.org/r/2787/diff/ > > > Testing > ------- > > > Thanks, > > Tony Gutierrez > > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
