Hi Clint, I was hoping someone else would respond, but seeing as they haven't I'll give it a try. The Resume and Activate messages are completely different. Resume is the CPU getting a resume() (which is the opposite of a drain() call). Before any change can be made to the M5 object hierarchy, the memory system must be drained of all requests. Drain() instructs all objects to not issue new requests to the memory system, and only process responses such that all current outstanding requests can be completed. Resume() is the opposite, and it allows objects to resume issuing requests.
There is another pair of functions are suspend() and activate(). Suspend() (which ends up calling suspendContext()) suspends one processor context in response to a quiesce pseudo-instruction that we insert into the kernel to skip idle time instead of busy waiting in a spin loop. Activate() is the reverse of this. An activate() can happen either because of the time passed to the quiesceNs() pseudo instruction expiring, or because of an interrupt (such as the periodic timer interrupt) occurring. So, drain()/resume() and suspend()/activate() are different things. If the system is idle and you switch with a command line option (-F), I would expect that all the cpus would be idle, because they're sitting at the prompt and only waking up for timer interrupts every once in a while. On the other hand, if you execute m5 switchcpu at the prompt I would expect that one cpu would be active to execute the pseudo instruction that switches cpus. In that case you shouldn't see an Activate Context because it should already be active (unless there was an intervening Suspend Context. There still could be a problem, but it might not be exactly where you think it is. You're running the stable repository? I know there were some interrupt changes to the development repository, but I don't think those made it into stable yet. Some trace flags for the interrupt system might shed a bit of light on the problem. Ali On Oct 26, 2008, at 2:40 PM, Clint Smullen wrote: > I do not see any assertion errors, and I am encountering this problem > even with a vanilla codebase pulled from the stable repository and > using the 2.0b3 files straight off the website, but it is perhaps > related to the "switch cpus problem" message. > > When switching from the atomic to timing processors , the simulator > instantly becomes stuck. It appears to occur with any number of CPUs > (I've tried one, two, and four), and the stats file (after killing it) > shows the same symptomatic behavior: one or more of the switch cpus > have executed no instructions and no cycles have elapsed, but all > other processors have continued to make progress. However, it is not > consistent which processors get stuck, though it is only one or two > for a four CPU setup. > > I created traces using the SimpleCPU flag, and what I see is that the > all the CPUs show that they are started with "Resume", but the CPUs > which get stuck never have an ActivateContext message. An example > section from a trace file is shown below where only switch_cpus3 is > stuck. No other messages pertaining to switch_cpus3 ever appear in the > trace, and the stats file shows no instructions or cycles for that > processor. > > 4084372420500: system.switch_cpus2: Resume > 4084372420500: system.switch_cpus3: Resume > 4084372420500: system.switch_cpus0: Resume > 4084372420500: system.switch_cpus1: Resume > 4084960937500: system.switch_cpus0: ActivateContext 0 (1 cycles) > 4084960937500: system.switch_cpus1: ActivateContext 0 (1 cycles) > 4084960937500: system.switch_cpus2: ActivateContext 0 (1 cycles) > 4084960938000: system.switch_cpus2: Fetch > 4084960938000: system.switch_cpus1: Fetch > 4084960938000: system.switch_cpus0: Fetch > 4084960939000: system.switch_cpus0: Complete ICache Fetch > 4084960939000: system.switch_cpus0: Fetch > 4084960939000: system.switch_cpus1: Complete ICache Fetch > 4084960939000: system.switch_cpus1: Fetch > 4084960939000: system.switch_cpus2: Complete ICache Fetch > 4084960939000: system.switch_cpus2: Fetch > 4084960940000: system.switch_cpus2: Complete ICache Fetch > > > I've not worked much with the CPU side of the M5 codebase, so I've not > attempted to find what is wrong. All I know is that it did not occur > with the original 2.0b6 version of the stable codebase, nor with the > 2.0b5-era versions. The only significant change I know of that dropped > into the stable repository since then is the new event queue handling. > Suggestions for things to look at would be appreciated. > > Here is an example of how I am running the example FS script (I've > also tried m5.fast and m5.debug, they both give the same, non- > deterministic results): > > ~/m5-vanilla/build/ALPHA_FS/m5.opt fs.py -n 4 -F 10000000000 --caches > -t > > I use "m5 switchcpu" on the terminal after startup is finished to > switch the CPUs over, though it also occurs automatically if one > lowers the fast-forward instruction count to a much smaller value. If > I specify to switch to O3 cpus, I do not have any problems. > > Thanks, > - Clint Smullen > _______________________________________________ > m5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > _______________________________________________ m5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
