Hi Nilay,

Sorry for not replying earlier, I've been pretty busy fixing up bugs in the x86 frontend to make SPEC CPU2006 actually work on x86. About 20% of the benchmarks in my setup execute unsupported instructions and don't verify correctly (some even terminate long before they are supposed to).

My take on the state of things at the moment is that there is probably a bug if the second CPU is still executing in microcode and no CPU in a drained system should ever be executing microcode. The patch as it is now breaks all the assumptions we make about a drained system, so I'm still opposed to committing it.

I did some investigations of my own and it seems like the second CPU is held at pc 0xfffffff0.0 (i.e, not executing microcode early in the boot). Could you try to find out why the second CPU in your system is stuck in microcode? I suspect that the fact that your system is stuck in microcode is actually a bug.

We should probably clean up the whole CPU state mess once and for all at some point. I'm still not exactly sure, despite my implementing my own CPU model, what is expected to happen in a CPU when the different methods controlling state transitions are called. Not to mention where the methods are called from and why. We really need to document that.

//Andreas



On 06/14/2013 01:16 AM, Nilay Vaish wrote:
At the very outset, let me state that there are things that I do not
have a complete understanding of. So what ever I am stating my be how I
feel things work after observing the code execution.

1. I think a processor in Idle state means that there is no PC
associated with it. If you want to, I think you can assume the PC to be
0 in Idle state.

2. From what I observed, it seems that the processor that boots up first
controls the booting of other processors. In the two processor system
that I tested with, I observed that processor 1 executed 4-5
instructions from the microcode rom. Then it stopped with the micropc !=
0 and did not execute anything for a considerable amount of time. Then,
it started executing instructions at some point in time. The in between
time when it was not executing instructions, the cpu type was switched
at least 2-3 times. I had to make those changes to the code so to
prevent the system from getting into a deadlock as micropc for processor
1 was != 0 and it was not executing anything.

If we force that the processor 1 cannot stop with micropc != 0 and if it
is true that processor 1 is dependent on processor 0, I think we can get
into a deadlock while trying to drain the system. This is because
processor 1 will not execute anything since it is waiting for processor
0 to do some thing, while processor 0 will drain it self and stop
executing instructions.

--
Nilay


On Tue, 11 Jun 2013, Andreas Sandberg wrote:


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/1904/#review4415
-----------------------------------------------------------


I get the feeling that this will completely break all the assumptions
we make about a drained system not executing microcode. You need to
change that before committing, otherwise all KVM-related work will break.


src/cpu/o3/commit_impl.hh
<http://reviews.gem5.org/r/1904/#comment4140>

   This breaks all the assumptions we make about a drained system. It
doesn't matter if it is Idle or not, as long as it is executing
microcode, it isn't drained and will break things.



src/cpu/simple/atomic.cc
<http://reviews.gem5.org/r/1904/#comment4137>

   I'm not sure if this is actually correct. Isn't it possible for a
CPU to be Idle and in microcode or something else that will break the
assumptions we have about drained CPUs?



src/cpu/simple/atomic.cc
<http://reviews.gem5.org/r/1904/#comment4138>

   isDrained() should always be true, independent of the state, when
we reach this code. Could you move it to a separate assert to make
sure we better error messages if it explodes?



src/cpu/simple/timing.cc
<http://reviews.gem5.org/r/1904/#comment4139>

   Same as for the atomic CPU. Should probably be something like this:

   assert(_statue == Running || _status == Idle);
   assert(isDrained());



- Andreas Sandberg


On June 7, 2013, 3:48 p.m., Nilay Vaish wrote:

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/1904/
-----------------------------------------------------------

(Updated June 7, 2013, 3:48 p.m.)


Review request for Default.


Description
-------

Changeset 9748:5d7be2fc04c7
---------------------------
cpu: some changes to switching

1. The fetch stage, when it overtakes from another cpu, initializes its
status variable to Running. When an o3 cpu is about to switch to another
cpu, the fetch stage checks that its status should be Idle. Now suppose
there are two processors in the system. The operating system has just
started and it is running on cpu0. Then, cpu1 would not be actually
doing
anything. When trying to switch to another cpu, cpu1 gets stuck because
there is nothing going on that will move it from Running to Idle.

I think we should have the fetchStatus be initialized to Idle state.
The o3 cpu will activate some context at point in future. When it does
that, it calls the function wakeFromQuiesce() on the fetch stage. This
function in turn changes the fetchStatus to Running. As of now, the
function only does this for thread 0. I am proposing that we
pass the thread id and do it for that particular thread.

2. The TimingSimpleCPU incorrectly tested its status when switching out.

3. The commit stage should check for the its status being Idle when
testing whether it needs to drain itself.

4. The atomic cpu should test its status variable before checking any
other variables for deciding on draining.


Diffs
-----

  src/cpu/o3/commit_impl.hh ea26ba576891
  src/cpu/o3/cpu.cc ea26ba576891
  src/cpu/o3/fetch.hh ea26ba576891
  src/cpu/o3/fetch_impl.hh ea26ba576891
  src/cpu/simple/atomic.cc ea26ba576891
  src/cpu/simple/timing.cc ea26ba576891

Diff: http://reviews.gem5.org/r/1904/diff/


Testing
-------


Thanks,

Nilay Vaish





_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to