Hi Dan,

I want to follow up on your email from a couple weeks ago and then suggest 
another area of near-term development.  To answer your first question below, 
I'm not sure if quiescing Ruby is the "right way" to implement functional 
accesses, but I do believe it is the easiest way given the current requirements 
for functional accesses.  Similarly to answer your second question, I think the 
easiest way to fast-forward Ruby events is to use the current cache trace 
methodology, however the best way may be a very different.  If we add 
fast-forward support to Ruby, we need to make sure we don't overly complicate 
the coherence specification since that is already the most complicated part of 
the system.

While functional accesses and fast-forwarding are significant development 
tasks, one relatively easy near-term task that I failed to mention in my 
previous email, is there are several straight-forward performance optimizations 
that could be made to Ruby.  Specifically Steve recently pointed out that Ruby 
simulation speed could be noticeably reduced if it used M5's assertion and 
debug mechanisms.  Therefore when one compiles m5.fast, those assertions and 
debug statements will be removed.  Also Steve noted that CacheMemory's use of a 
hashmap to find a tag in a cache set seems to consume a considerable amount of 
time for certain sized caches.  I believe Derek recently added that 
optimization, so maybe he can elaborate when that optimization is beneficial.  
It seems that we can do some further optimization there.

Do you think anyone at Wisconsin can look into these performance optimizations? 
 If so, I would be happy to set up a meeting and discuss them in more detail.

Thanks,

Brad


From: [email protected] [mailto:[email protected]] On Behalf Of 
Dan Gibson
Sent: Tuesday, August 24, 2010 9:43 AM
To: M5 Developer List
Subject: Re: [m5-dev] Next Tasks for GEM5

Dear all,
Having chewed on this for a week, I'd like to invite some discussion. It seems 
to me that the most general problem outlined by Brad is massaging Ruby into 
accepting various flavors of non-timing accesses.

In particular, a working Ruby functional access could:
- Aid in cache warm-up
- Help deal with devices
- Maybe other useful things too (I'm no M5 expert)

The question is: How do we go about implementing these accesses? I get the 
impression that functional and timing accesses will occasionally intermix in 
the memory system, and that 'the right thing' should happen when 
functional/timing accesses mix. This presents a problem for an arbitrary 
protocol, as a functional access can/will occur at exactly the wrong time -- 
e.g., when a block is in a blocking transient state.

The simplest solution seems to be the most obvious as well: quiesce the memory 
system for each non-timing access, then handle the functional access in 
isolation. This means that each functional access becomes a timing transient 
instead of a correctness problem.

Unfortunately, the architecture of Ruby requires events to 'run' in order to 
service requests. If functional accesses are to be useful for warming caches, 
they have to affect coherence and permissions state in an approximately correct 
manner. One way or another, this means fast-forwarding Ruby.

This is the area around which I'd like to invite discussion.
- Is quiescing Ruby the right way to implement functional access?
- What is the best way to go about fast-forwarding Ruby events?

Regards,
Dan
On Wed, Aug 18, 2010 at 1:18 PM, Beckmann, Brad 
<[email protected]<mailto:[email protected]>> wrote:
Hi All,

Yesterday a few of us from AMD and Wisconsin met and discussed the next tasks 
for GEM5.  Specifically there are a couple (possibly more?) new graduate 
students at Wisconsin that are starting to ramp up in the simulator.  While we 
spent some time discussing short-term projects for the new graduate students, 
the majority of the time was focused on a remaining steps necessary before Ruby 
can supply data to M5 cpus while using warmed up cache traces.  Below is a 
summary of our meeting as well as a discussion on the possible directions we 
can take.  I'm send this summary out so that other can comment and provide 
feedback.

Please let me know if you have any questions,

Brad


Short-term projects for the new Wisconsin graduate students

-          Incorporate work completed/transaction completed metrics to 
Simulate.py

-          Include randomization support into the memory system to simulate 
multiple execution paths

Tasks required before Ruby can supply data to M5 cpus while using a warmed up 
cache traces.

-          Add support for cache flushes within the protocols

o   This mechanism is required by certain x86 instructions and memory types

o   Furthermore it could be leveraged to create checkpoints that include both 
valid main memory data as well as a cache warmup trace with valid data.  (More 
on this topic below)

-          Provide support for allowing certain simobjects to be scheduled on 
the event queue without advancing sim_ticks

o   In order to run a cache warmup trace through Ruby, Ruby requests need to be 
executed and Ruby simobjects need to be scheduled.  However, at the end of 
warmup, the simticks and the rest of the simulator state need to be consistent 
with the loaded checkpoint.

*  Currently, we (at AMD) have an internal patch that achieves this 
functionality by leveraging the fact that Ruby objects still use the Ruby 
eventqueue API.  During this warmup phase, the Ruby eventqueue detaches from 
the M5 event queue and instead uses the old ruby event queue implementation to 
schedule events.  Once the warmup is complete, the Ruby eventqueue reattaches 
to the M5 event queue.  This obviously is not the real way we want to do this 
because eventually we want all Ruby events to directly use the M5 event queue 
API.  That is why I don't have plans to check in this current patch to the 
public tree.

o   One possible solution would be to identify which events can be scheduled 
during cache warmup and assume all other events can only be scheduled during 
actual execution.

*  I'm interested to know how complicated others believe a solution like that 
would be?

-          Once the two above tasks are complete, we should be able to create 
cache warmup traces with data and also provide valid data in main memory.

o   The motivation for providing valid data in both the cache trace and main 
memory is that we maintain the flexibility of allowing each protocol to create 
their own policies for handling dirty data.

o   The specific mechanism would be to record the cache trace using Ruby's 
current CacheRecorder (I've already revitalized this code in GEM5) and then use 
the cache flush mechanism to flush dirty data to main memory before 
checkpointing memory.

-          Add functional access support to Ruby.

o   One possible way to add functional access support to Ruby is to quiesce or 
drain outstanding Ruby requests when a dynamic functional access is initiated 
and actually perform the access after Ruby has been drained of any outstanding 
requests.

*  Therefore, if all cache blocks are in a base state, then Ruby can use the 
protocol independent AccessPremissions on the cache and data blocks to 
determine which blocks should be read and written.  This would only require 
additional set state operations in the Directory sm files to set the 
AccessPremissions for a block.  The cache sm files already do this so the 
impact to the protocols would be minimal.

*  However, the obvious disadvantage to this approach is that every functional 
access will perturb the timing of the system.

o   Another possible approach is to add atomic access support to Ruby and 
utilize this for functional accesses.

*  Basically instead of using events resulting from the receiving of timed 
messages to transition between base states, use function calls.

*  I'm not sure how to make this work, and I fear that it will restrict how 
protocols are defined.  Furthermore, it seems it would be extremely complicated 
to get these atomic accesses to work while timing accesses are active in the 
system and cache blocks are already in some sort of transient state.  Overall, 
I suspect this is not easily feasible, but I could be wrong.

o   A third option is to restrict functional writes to only initialization and 
allow dynamic functional reads to be only a best effort and not guaranteed to 
be correct.

*  I believe Steve has already discussed such a possibility with Nate and Ali.

*  Functional writes at initialization are trivial to support in Ruby since all 
blocks are in a steady state.  I actually already have an internal patch that 
provides some of this support.  If functional reads are relaxed to be a best 
effort, than Ruby will almost always succeed reading valid data using 
AccessPremissions without quiescing the system and only rarely fail because a 
block is in a transient state.

Other outstanding tasks to keep in mind

-          Merging stat files

o   We've discussed this in several previous emails, but I just wanted to 
reiterate it here.

-          Allow I/O requests/responses and interrupts to flow through the Ruby 
network.

o   Currently these are simply routed to a classic M5 bus, but these should be 
included in the Ruby network.

o   It will take some effort to make this work, but it shouldn't be too hard.


_______________________________________________
m5-dev mailing list
[email protected]<mailto:[email protected]>
http://m5sim.org/mailman/listinfo/m5-dev



--
http://www.cs.wisc.edu/~gibson [esc]:wq!
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to