Re: [m5-users] Also having a problem with switch cpus

Clint Smullen Mon, 27 Oct 2008 15:24:55 -0700

I am utilizing the latest stable version. All the results I have shown  
are for complete vanilla codebase running the fs.py example script.

I tried adding an extra drain into the Simulation.py script as part of  
the memory mode switch, but that had no impact. Then I added tracing  
of interrupts, as you suggested, as well adding other messages. Here  
is the section of the trace that matches what I gave before:

349997653500: system.switch_cpus0: Processor is now idle
2349997653500: system.switch_cpus1: Processor is now idle
2349997653500: system.switch_cpus2: Processor is now idle
2349997653500: system.switch_cpus3: Processor is now running
2349997653500: system.switch_cpus2: Resume
2349997653500: system.switch_cpus3: Resume
2349997653500: system.switch_cpus3: Constructing a new FetchEvent
2349997653500: system.switch_cpus0: Resume
2349997653500: system.switch_cpus1: Resume
2350585937500: system.cpu0.interrupts: Interrupt 22:0 posted
2350585937500: system.switch_cpus0: Suspended Processor awoke
2350585937500: system.switch_cpus0: ActivateContext 0 (1 cycles)
2350585937500: system.cpu1.interrupts: Interrupt 22:0 posted
2350585937500: system.switch_cpus1: Suspended Processor awoke
2350585937500: system.switch_cpus1: ActivateContext 0 (1 cycles)
2350585937500: system.cpu2.interrupts: Interrupt 22:0 posted
2350585937500: system.switch_cpus2: Suspended Processor awoke
2350585937500: system.switch_cpus2: ActivateContext 0 (1 cycles)
2350585937500: system.cpu3.interrupts: Interrupt 22:0 posted
2350585938000: system.switch_cpus2: Fetch
2350585938000: system.switch_cpus1: Fetch
2350585938000: system.switch_cpus0: Fetch
2350585968000: system.switch_cpus2: Complete ICache Fetch
2350585968000: system.switch_cpus2: Fetch
2350585970000: system.switch_cpus1: Complete ICache Fetch
2350585970000: system.switch_cpus1: Fetch
2350585973000: system.switch_cpus0: Complete ICache Fetch

One can see that cpu3.interrupts is posting the same interrupt as all  
the others, but it appears that switch_cpus3 never responds to it. The  
lack of the "Suspended Processor awoke" for switch_cpus3 indicates  
that its thread is not suspended. The "Processor is now running"  
message occurs in takeOverFrom when an active thread context is found  
but the previous state was not Running. Following this, the  
"Constructing a new FetchEvent" occurs in resume when it is detected  
that the state is not idle. Despite creating the new FetchEvent, note  
that there is never a "...switch_cpus3: Fetch" message. The only cause  
of this that I can imagine is that the event is not explicitly  
scheduled.

Looking in EventWrapper inside eventq.hh, the old implementation  
(using the old event queue system) automatically called schedule in  
the constructor while the new version does not. Inserting a schedule  
call right after the "... new FetchEvent(..)" line makes the problem  
go away. Looking in cpu/base.cc, the only other use of the  
EventWrapper in the entire cpu subtree does have an explicit schedule  
call. Looking at other instances of EventWrapper shows that their  
behavior has changed as well. I also noted that most other uses of  
EventWrapper do not continually create and destroy the event.

I have put both this features into a small patch that resolves this  
issue, that I have sent to the developers list. Let me know what you  
think of it. I am uncertain whether calling deschedule right before  
always calling schedule is necessary, but I left it in to be safe.

        - Clint Smullen

On Oct 27, 2008, at 2:41 PM, Ali Saidi wrote:

> Hi Clint,
>
> I was hoping someone else would respond, but seeing as they haven't
> I'll give it a try. The Resume and Activate messages are completely
> different. Resume is the CPU getting a resume() (which is the opposite
> of a drain() call). Before any change can be made to the M5 object
> hierarchy, the memory system must be drained of all requests. Drain()
> instructs all objects to not issue new requests to the memory system,
> and only process responses such that all current outstanding requests
> can be completed. Resume() is the opposite, and it allows objects to
> resume issuing requests.
>
> There is another pair of functions are suspend() and activate().
> Suspend() (which ends up calling suspendContext()) suspends one
> processor context in response to a quiesce pseudo-instruction that we
> insert into the kernel to skip idle time instead of busy waiting in a
> spin loop. Activate() is the reverse of this. An activate() can happen
> either because of the time passed to the  quiesceNs() pseudo
> instruction expiring, or because of an interrupt (such as the periodic
> timer interrupt) occurring.
>
> So, drain()/resume() and suspend()/activate() are different things. If
> the system is idle and you switch with a command line option (-F), I
> would expect that all the cpus would be idle, because they're sitting
> at the prompt and only waking up for timer interrupts every once in a
> while. On the other hand, if you execute m5 switchcpu at the prompt I
> would expect that one cpu would be active to execute the pseudo
> instruction that switches cpus. In that case you shouldn't see an
> Activate Context because it should already be active (unless there was
> an intervening Suspend Context.
>
> There still could be a problem, but it might not be exactly where you
> think it is. You're running the stable repository? I know there were
> some interrupt changes to the development repository, but I don't
> think those made it into stable yet. Some trace flags for the
> interrupt system might shed a bit of light on the problem.
>
> Ali

_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Re: [m5-users] Also having a problem with switch cpus

Reply via email to