Hi Dan, I want to follow up on your email from a couple weeks ago and then suggest another area of near-term development. To answer your first question below, I'm not sure if quiescing Ruby is the "right way" to implement functional accesses, but I do believe it is the easiest way given the current requirements for functional accesses. Similarly to answer your second question, I think the easiest way to fast-forward Ruby events is to use the current cache trace methodology, however the best way may be a very different. If we add fast-forward support to Ruby, we need to make sure we don't overly complicate the coherence specification since that is already the most complicated part of the system.
While functional accesses and fast-forwarding are significant development tasks, one relatively easy near-term task that I failed to mention in my previous email, is there are several straight-forward performance optimizations that could be made to Ruby. Specifically Steve recently pointed out that Ruby simulation speed could be noticeably reduced if it used M5's assertion and debug mechanisms. Therefore when one compiles m5.fast, those assertions and debug statements will be removed. Also Steve noted that CacheMemory's use of a hashmap to find a tag in a cache set seems to consume a considerable amount of time for certain sized caches. I believe Derek recently added that optimization, so maybe he can elaborate when that optimization is beneficial. It seems that we can do some further optimization there. Do you think anyone at Wisconsin can look into these performance optimizations? If so, I would be happy to set up a meeting and discuss them in more detail. Thanks, Brad From: [email protected] [mailto:[email protected]] On Behalf Of Dan Gibson Sent: Tuesday, August 24, 2010 9:43 AM To: M5 Developer List Subject: Re: [m5-dev] Next Tasks for GEM5 Dear all, Having chewed on this for a week, I'd like to invite some discussion. It seems to me that the most general problem outlined by Brad is massaging Ruby into accepting various flavors of non-timing accesses. In particular, a working Ruby functional access could: - Aid in cache warm-up - Help deal with devices - Maybe other useful things too (I'm no M5 expert) The question is: How do we go about implementing these accesses? I get the impression that functional and timing accesses will occasionally intermix in the memory system, and that 'the right thing' should happen when functional/timing accesses mix. This presents a problem for an arbitrary protocol, as a functional access can/will occur at exactly the wrong time -- e.g., when a block is in a blocking transient state. The simplest solution seems to be the most obvious as well: quiesce the memory system for each non-timing access, then handle the functional access in isolation. This means that each functional access becomes a timing transient instead of a correctness problem. Unfortunately, the architecture of Ruby requires events to 'run' in order to service requests. If functional accesses are to be useful for warming caches, they have to affect coherence and permissions state in an approximately correct manner. One way or another, this means fast-forwarding Ruby. This is the area around which I'd like to invite discussion. - Is quiescing Ruby the right way to implement functional access? - What is the best way to go about fast-forwarding Ruby events? Regards, Dan On Wed, Aug 18, 2010 at 1:18 PM, Beckmann, Brad <[email protected]<mailto:[email protected]>> wrote: Hi All, Yesterday a few of us from AMD and Wisconsin met and discussed the next tasks for GEM5. Specifically there are a couple (possibly more?) new graduate students at Wisconsin that are starting to ramp up in the simulator. While we spent some time discussing short-term projects for the new graduate students, the majority of the time was focused on a remaining steps necessary before Ruby can supply data to M5 cpus while using warmed up cache traces. Below is a summary of our meeting as well as a discussion on the possible directions we can take. I'm send this summary out so that other can comment and provide feedback. Please let me know if you have any questions, Brad Short-term projects for the new Wisconsin graduate students - Incorporate work completed/transaction completed metrics to Simulate.py - Include randomization support into the memory system to simulate multiple execution paths Tasks required before Ruby can supply data to M5 cpus while using a warmed up cache traces. - Add support for cache flushes within the protocols o This mechanism is required by certain x86 instructions and memory types o Furthermore it could be leveraged to create checkpoints that include both valid main memory data as well as a cache warmup trace with valid data. (More on this topic below) - Provide support for allowing certain simobjects to be scheduled on the event queue without advancing sim_ticks o In order to run a cache warmup trace through Ruby, Ruby requests need to be executed and Ruby simobjects need to be scheduled. However, at the end of warmup, the simticks and the rest of the simulator state need to be consistent with the loaded checkpoint. * Currently, we (at AMD) have an internal patch that achieves this functionality by leveraging the fact that Ruby objects still use the Ruby eventqueue API. During this warmup phase, the Ruby eventqueue detaches from the M5 event queue and instead uses the old ruby event queue implementation to schedule events. Once the warmup is complete, the Ruby eventqueue reattaches to the M5 event queue. This obviously is not the real way we want to do this because eventually we want all Ruby events to directly use the M5 event queue API. That is why I don't have plans to check in this current patch to the public tree. o One possible solution would be to identify which events can be scheduled during cache warmup and assume all other events can only be scheduled during actual execution. * I'm interested to know how complicated others believe a solution like that would be? - Once the two above tasks are complete, we should be able to create cache warmup traces with data and also provide valid data in main memory. o The motivation for providing valid data in both the cache trace and main memory is that we maintain the flexibility of allowing each protocol to create their own policies for handling dirty data. o The specific mechanism would be to record the cache trace using Ruby's current CacheRecorder (I've already revitalized this code in GEM5) and then use the cache flush mechanism to flush dirty data to main memory before checkpointing memory. - Add functional access support to Ruby. o One possible way to add functional access support to Ruby is to quiesce or drain outstanding Ruby requests when a dynamic functional access is initiated and actually perform the access after Ruby has been drained of any outstanding requests. * Therefore, if all cache blocks are in a base state, then Ruby can use the protocol independent AccessPremissions on the cache and data blocks to determine which blocks should be read and written. This would only require additional set state operations in the Directory sm files to set the AccessPremissions for a block. The cache sm files already do this so the impact to the protocols would be minimal. * However, the obvious disadvantage to this approach is that every functional access will perturb the timing of the system. o Another possible approach is to add atomic access support to Ruby and utilize this for functional accesses. * Basically instead of using events resulting from the receiving of timed messages to transition between base states, use function calls. * I'm not sure how to make this work, and I fear that it will restrict how protocols are defined. Furthermore, it seems it would be extremely complicated to get these atomic accesses to work while timing accesses are active in the system and cache blocks are already in some sort of transient state. Overall, I suspect this is not easily feasible, but I could be wrong. o A third option is to restrict functional writes to only initialization and allow dynamic functional reads to be only a best effort and not guaranteed to be correct. * I believe Steve has already discussed such a possibility with Nate and Ali. * Functional writes at initialization are trivial to support in Ruby since all blocks are in a steady state. I actually already have an internal patch that provides some of this support. If functional reads are relaxed to be a best effort, than Ruby will almost always succeed reading valid data using AccessPremissions without quiescing the system and only rarely fail because a block is in a transient state. Other outstanding tasks to keep in mind - Merging stat files o We've discussed this in several previous emails, but I just wanted to reiterate it here. - Allow I/O requests/responses and interrupts to flow through the Ruby network. o Currently these are simply routed to a classic M5 bus, but these should be included in the Ruby network. o It will take some effort to make this work, but it shouldn't be too hard. _______________________________________________ m5-dev mailing list [email protected]<mailto:[email protected]> http://m5sim.org/mailman/listinfo/m5-dev -- http://www.cs.wisc.edu/~gibson [esc]:wq!
_______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
