> On 2011-10-27 22:35:21, Brad Beckmann wrote:
> > Thanks for the heads up on this patch.  I'm glad you found the time to dive 
> > into it.
> > 
> > 
> > 
> > I'm confused that the comment mentions a "list of ports", but I don't see a 
> > list of ports in the code and I'm not sure how would even be used?
> > 
> > The two questions you pose are good ones.  Hopefully someone who 
> > understands the O3 LSQ can answer the first, and I would suggest creating a 
> > new directed test that can manipulate the enqueue latency on the mandatory 
> > queue to create the necessary test situations. 
> > 
> > Also, I have a couple high-level comments right now:
> > 
> > 
> > 
> > - Ruby doesn't implement any particular memory model.  It just implements 
> > the cache coherence protocol, and more specifically invalidation based 
> > protocols.  The protocol, in combination with the core model, results in 
> > the memory model.
> > 
> > 
> > - I don't think it is sufficient to just forward those probes that hit 
> > valid copies to the O3 model.  What about replacements of blocks that have 
> > serviced a speculative load?  Instead, my thought would be to forward all 
> > probes to the O3 LSQ and think of cpu-controlled policies to filter out 
> > unecessary probes.
> 
> Nilay Vaish wrote:
>     Hi Brad, thanks for the response.
>     
>     * A list of ports has been added to RubyPort.hh, the ports are added
>       to the list whenever a new M5Port is created.
>     
>     * As long as the core waits for an ack from the memory system for every 
> store
>       before issuing the next one, I can understand that memory model is 
> independent
>       of how the memory system is implemented. But suppose the caches are 
> multi-ported.
>       Then will the core only use one of the ports for stores and wait for an 
> ack?
>       The current LSQ implementation uses as many ports as available. In this 
> case,
>       would not the memory system need to ensure the order in which the 
> stores are
>       performed?
>     
>     * I think the current implementation handles blocks coherence permissions 
> for
>       which were speculatively fetched. If the cache looses permissions on 
> this
>       block, then it will forward the probe to the CPU. If the cache again 
> receives
>       a probe for this block, I don't think that the CPU will have any 
> instruction
>       using the value from that block.
>     
>     * For testing, Prof. Wood suggested having some thing similar to TSOtool.
> 
> Brad Beckmann wrote:
>     Hmm...I'm now even more confused.  I have not looked at the O3 LSQ, but 
> it sounds like from your description that one particular instantiation of the 
> LSQ will use N ports, not just a single port to the L1D.  So does N equal the 
> number of simultaneous loads and stores that can be issued per cycle, or is N 
> equal to the number of outstanding loads and stores supported by the LSQ?  Or 
> does it equal something completely different?
>     
>     Stores to different cache blocks can be issued to the memory system 
> out-of-order and in parallel.  Ruby already supports such functionality.  The 
> key is the store buffer must be drained in-order.  It is up to the store 
> buffer's functionality to get that right.  Ruby can assist by providing 
> interfaces for checking permission state and forwarding probes upstream, but 
> it is up to the LSQ/store buffer to act appropriately and retry requests when 
> necessary.  I don't believe Ruby needs any fundamental changes to support 
> x86-TSO.  Instead, Ruby just needs to provide more information back to the 
> LSQ.
>     
>     Earlier I didn't notice that you also squash speculation on replacements, 
> in addition to probes.  Yeah, I think those changes take care of correctly 
> squashing speculative loads.  However, as I mentioned above, I still think we 
> need to figure out how to provide the necessary information to allow stores 
> to be issued in parallel, while still retiring in-order.
>     
>     Implementing something similar to TSOtool would be great.  However, I 
> think there is benefit to do some quick tests using a DirectedTester before 
> creating something like TSOtool.
>     
>     
>

Brad,

My understanding is that the LSQ can issue at most N loads and stores to
the memory system in each cycle.

For parallel stores, it seems that the core should have permissions for
these cache blocks all at the same time. Even if Ruby fetches coherence
permissions out-of-order, it would still have to ensure, for SC or TSO,
that stores that happened logically later in time become visible only
after all the earlier ones are visible to rest of the system. As of now,
I disagree with the statement that --
          '' Stores to different cache blocks can be issued to the
             memory system out-of-order and in parallel ''
Unless we have some kind of guarantee on the order in which these stores
become visible to the rest of the system, I don't see how we can separate
out the memory system's behavior from the consistency model.

I was thinking of writing a tester that reads in a trace of memory operations
performed by a multi-processor system and the times at which these are 
performed.
Then we can check the load values against the expected load values. I think the
underlying assumption is that everything behaves in a deterministic fashion. 
What
do you think?


- Nilay


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/894/#review1620
-----------------------------------------------------------


On 2011-10-17 23:50:47, Nilay Vaish wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/894/
> -----------------------------------------------------------
> 
> (Updated 2011-10-17 23:50:47)
> 
> 
> Review request for Default.
> 
> 
> Summary
> -------
> 
> This patch implements the functionality for forwarding invalidations
> and replacements from the L1 cache of the Ruby memory system to the O3
> CPU. The implementation adds a list of ports to RubyPort. Whenever a 
> replacement
> or an invalidation is performed, the L1 cache forwards this to all the ports,
> which I believe is the LSQ in case of the O3 CPU. Those who understand the O3
> LSQ should take a close look at the implementation and figure out (at least
> qualitatively) if some thing is missing or erroneous.
> 
> This patch only modifies the MESI CMP directory protocol. I will modify other
> protocols once we sort the major issues surrounding this patch.
> 
> My understanding is that this should ensure an SC execution, as
> long as Ruby can support SC. But I think Ruby does not support any 
> memory model currently. A couple of issues that need discussion --
> 
> * Can this get in to a deadlock? A CPU may not be able to proceed if
>   a particularly cache block is repeatedly invalidated before the CPU
>   can retire the actual load/store instruction. How do we ensure that
>   at least one instruction is retired before an invalidation/replacement
>   is processed?
> 
> * How to test this implementation? Is it possible to implement some of the
>   tests that we regularly come across in papers on consistency models? Or
>   those present in manuals from AMD and Intel? I have tested that Ruby will
>   forward the invalidations, but not the part where the LSQ needs to act on
>   it.
> 
> 
> Diffs
> -----
> 
>   build_opts/ALPHA_SE_MESI_CMP_directory 92ba80d63abc 
>   configs/example/se.py 92ba80d63abc 
>   configs/ruby/MESI_CMP_directory.py 92ba80d63abc 
>   src/mem/protocol/MESI_CMP_directory-L1cache.sm 92ba80d63abc 
>   src/mem/protocol/RubySlicc_Types.sm 92ba80d63abc 
>   src/mem/ruby/system/RubyPort.hh 92ba80d63abc 
>   src/mem/ruby/system/RubyPort.cc 92ba80d63abc 
>   src/mem/ruby/system/Sequencer.hh 92ba80d63abc 
>   src/mem/ruby/system/Sequencer.cc 92ba80d63abc 
> 
> Diff: http://reviews.m5sim.org/r/894/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Nilay
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to