-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2787/#review6383
-----------------------------------------------------------


I was buffering these comments, because this change has large implications and 
we still haven't discussed the differences between Ruby and classic memory 
interfaces. The patch submitter should be able to address questions like the 
ones I've raised so far (I don't feel any clarification has been provided yet).

Once again, I did more digging: It appears that classic caches completely model 
MSHR queuing and write buffering, and they block requests from the core when 
these are full. This indicates that they accept multiple concurrent accesses to 
a single line (it would be nice to get confirmation on that from a classic 
user). Thus, it makes sense to allow the Ruby sequencer to accept multiple 
outstanding accesses to a single line concurrently to be consistent with 
classic.

Overall, I have significant complaints about this patch, because it introduces 
buffering into the sequencer, which has the effect of implementing MSHR queuing 
that should probably be in the caches to be consistent with classic. This 
breaks the sequencer's thin shim abstraction between cores and cache 
controllers, appears to break no-RFO L1 caches, and side-steps cache resource 
constraints. Detailed comments are below.


src/mem/ruby/system/Sequencer.hh (line 187)
<http://reviews.gem5.org/r/2787/#comment5490>

    Minor: This declaration introduces yet another member variable naming 
convention (m_UpperCamel). Can you please change the name of this table so it 
at least matches some other variables? m_lowerCamel looks most common, and 
gem5's style guideline suggests it should be lowerCamel (i.e. without 'm_').
    
    (I understand a lot of Ruby code is bad about this, but we can keep from 
making it worse)



src/mem/ruby/system/Sequencer.cc (line 89)
<http://reviews.gem5.org/r/2787/#comment5491>

    Minor: Please follow the local_variable naming convention that is already 
used in this function. These should be aliased_it and aliased_it_end



src/mem/ruby/system/Sequencer.cc (line 154)
<http://reviews.gem5.org/r/2787/#comment5492>

    Minor: Ibid



src/mem/ruby/system/Sequencer.cc (line 221)
<http://reviews.gem5.org/r/2787/#comment5519>

    Can you please clarify, since it hasn't been stated explicitly yet: Current 
L0/L1 cache controllers cannot handle multiple outstanding accesses to a single 
line? If so, can you elaborate on why? What breaks?



src/mem/ruby/system/Sequencer.cc (line 315)
<http://reviews.gem5.org/r/2787/#comment5520>

    This is a symptom of never sending the aliased request to the controller 
(at which point it's initialRequestTime would have been set appropriately). I 
feel that all requests to the Sequencer need to be sent through to the 
controller, which would fix this.



src/mem/ruby/system/Sequencer.cc (line 428)
<http://reviews.gem5.org/r/2787/#comment5522>

    The Sequencer should not be handling stores in this manner. This 
effectively coalesces and completes concurrent stores to a single line (in zero 
time) without their data ever being sent to the cache. In other words this 
effectively hoists MSHRs either up from the top-level cache controller (or 
pulls them down from the LSQ?). By doing this, repeated stores are never 
subject to resource constraints like cache porting or available MSHRs in the 
controller, and cache/MSHR access stats are incorrect.
    
    You need to send requests through to the cache controller.



src/mem/ruby/system/Sequencer.cc (line 436)
<http://reviews.gem5.org/r/2787/#comment5523>

    This breaks the no-RFO property of some L1s (and is particularly relevant 
for GPUs). For instance, this would fail with gem5-gpu's VI_hammer protocol, 
because the cache line is not returned on writes. You can't assume that the 
current data will be available when you've queued loads behind a store to a 
line. This is a major signal that MSHR queuing should not happen in the 
sequencer (which must be protocol-independent).
    
    You need to send requests through to the cache controller.



src/mem/ruby/system/Sequencer.cc (line 495)
<http://reviews.gem5.org/r/2787/#comment5521>

    There is a similar issue for decoalescing loads here in the Sequencer (also 
in zero time!). Specifically, by queuing accesses to a line in the Sequencer, 
the cache is never made aware of the data in the line that is needed for each 
of the queued accesses. This means that it does not know what portion of the 
line should be returned to the core, and this again dodges potential resource 
constraints like the amount of data that can be pulled from the line per cycle 
to respond to each outstanding access. Can the data from a cache line really be 
fanned out to multiple separate requests in the same cycle when queuing MSHRs 
(note: with strided cache access from O3 CPU, this could be 16, 32 or more 
concurrent accesses in a single cycle)?
    
    You really need to send requests through to the cache controller.


- Joel Hestness


On May 26, 2015, 7:42 p.m., Tony Gutierrez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2787/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 7:42 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> Changeset 10817:d010e6a8e783
> ---------------------------
> ruby: Fixed pipeline squashes caused by aliased requests
> 
> This patch was created by Bihn Pham during his internship at AMD.
> 
> This patch fixes a very significant performance bug when using the O3 CPU 
> model
> and Ruby. The issue was Ruby returned false when it received a request to the
> same address that already has an outstanding request or when the memory is
> blocked. As a result, O3 unnecessary squashed the pipeline and re-executed
> instructions. This fix merges readRequestTable and writeRequestTable in
> Sequencer into a single request table that keeps track of all requests and
> allows multiple outstanding requests to the same address. This prevents O3
> from squashing the pipeline.
> 
> 
> Diffs
> -----
> 
>   src/mem/ruby/system/Sequencer.hh df2aa91dba5b0f0baa351039f0802baad9ed8f1d 
>   src/mem/ruby/system/Sequencer.cc df2aa91dba5b0f0baa351039f0802baad9ed8f1d 
> 
> Diff: http://reviews.gem5.org/r/2787/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Tony Gutierrez
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to