I am utilizing the latest stable version. All the results I have shown
are for complete vanilla codebase running the fs.py example script.
I tried adding an extra drain into the Simulation.py script as part of
the memory mode switch, but that had no impact. Then I added tracing
of interrupts, as you suggested, as well adding other messages. Here
is the section of the trace that matches what I gave before:
349997653500: system.switch_cpus0: Processor is now idle
2349997653500: system.switch_cpus1: Processor is now idle
2349997653500: system.switch_cpus2: Processor is now idle
2349997653500: system.switch_cpus3: Processor is now running
2349997653500: system.switch_cpus2: Resume
2349997653500: system.switch_cpus3: Resume
2349997653500: system.switch_cpus3: Constructing a new FetchEvent
2349997653500: system.switch_cpus0: Resume
2349997653500: system.switch_cpus1: Resume
2350585937500: system.cpu0.interrupts: Interrupt 22:0 posted
2350585937500: system.switch_cpus0: Suspended Processor awoke
2350585937500: system.switch_cpus0: ActivateContext 0 (1 cycles)
2350585937500: system.cpu1.interrupts: Interrupt 22:0 posted
2350585937500: system.switch_cpus1: Suspended Processor awoke
2350585937500: system.switch_cpus1: ActivateContext 0 (1 cycles)
2350585937500: system.cpu2.interrupts: Interrupt 22:0 posted
2350585937500: system.switch_cpus2: Suspended Processor awoke
2350585937500: system.switch_cpus2: ActivateContext 0 (1 cycles)
2350585937500: system.cpu3.interrupts: Interrupt 22:0 posted
2350585938000: system.switch_cpus2: Fetch
2350585938000: system.switch_cpus1: Fetch
2350585938000: system.switch_cpus0: Fetch
2350585968000: system.switch_cpus2: Complete ICache Fetch
2350585968000: system.switch_cpus2: Fetch
2350585970000: system.switch_cpus1: Complete ICache Fetch
2350585970000: system.switch_cpus1: Fetch
2350585973000: system.switch_cpus0: Complete ICache Fetch
One can see that cpu3.interrupts is posting the same interrupt as all
the others, but it appears that switch_cpus3 never responds to it. The
lack of the "Suspended Processor awoke" for switch_cpus3 indicates
that its thread is not suspended. The "Processor is now running"
message occurs in takeOverFrom when an active thread context is found
but the previous state was not Running. Following this, the
"Constructing a new FetchEvent" occurs in resume when it is detected
that the state is not idle. Despite creating the new FetchEvent, note
that there is never a "...switch_cpus3: Fetch" message. The only cause
of this that I can imagine is that the event is not explicitly
scheduled.
Looking in EventWrapper inside eventq.hh, the old implementation
(using the old event queue system) automatically called schedule in
the constructor while the new version does not. Inserting a schedule
call right after the "... new FetchEvent(..)" line makes the problem
go away. Looking in cpu/base.cc, the only other use of the
EventWrapper in the entire cpu subtree does have an explicit schedule
call. Looking at other instances of EventWrapper shows that their
behavior has changed as well. I also noted that most other uses of
EventWrapper do not continually create and destroy the event.
I have put both this features into a small patch that resolves this
issue, that I have sent to the developers list. Let me know what you
think of it. I am uncertain whether calling deschedule right before
always calling schedule is necessary, but I left it in to be safe.
- Clint Smullen
On Oct 27, 2008, at 2:41 PM, Ali Saidi wrote:
> Hi Clint,
>
> I was hoping someone else would respond, but seeing as they haven't
> I'll give it a try. The Resume and Activate messages are completely
> different. Resume is the CPU getting a resume() (which is the opposite
> of a drain() call). Before any change can be made to the M5 object
> hierarchy, the memory system must be drained of all requests. Drain()
> instructs all objects to not issue new requests to the memory system,
> and only process responses such that all current outstanding requests
> can be completed. Resume() is the opposite, and it allows objects to
> resume issuing requests.
>
> There is another pair of functions are suspend() and activate().
> Suspend() (which ends up calling suspendContext()) suspends one
> processor context in response to a quiesce pseudo-instruction that we
> insert into the kernel to skip idle time instead of busy waiting in a
> spin loop. Activate() is the reverse of this. An activate() can happen
> either because of the time passed to the quiesceNs() pseudo
> instruction expiring, or because of an interrupt (such as the periodic
> timer interrupt) occurring.
>
> So, drain()/resume() and suspend()/activate() are different things. If
> the system is idle and you switch with a command line option (-F), I
> would expect that all the cpus would be idle, because they're sitting
> at the prompt and only waking up for timer interrupts every once in a
> while. On the other hand, if you execute m5 switchcpu at the prompt I
> would expect that one cpu would be active to execute the pseudo
> instruction that switches cpus. In that case you shouldn't see an
> Activate Context because it should already be active (unless there was
> an intervening Suspend Context.
>
> There still could be a problem, but it might not be exactly where you
> think it is. You're running the stable repository? I know there were
> some interrupt changes to the development repository, but I don't
> think those made it into stable yet. Some trace flags for the
> interrupt system might shed a bit of light on the problem.
>
> Ali
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users