[gem5-users] Issues while Draining the CPUs

pushkar nandkar Mon, 14 Apr 2014 17:17:19 -0700

Hi All,

There are three issues I am facing right now and I could not figure out a
solution/workaround for that. May be with your help I can.


*1. CPU do not get drained. *
Command line :  build/ALPHA_MESI_CMP_directory/gem5.opt --debug-flag=Drain
--debug-file=trace.out -d m5out/OutPutDir configs/example/ruby_fs.py -n 4
--cpu-type=detailed --restore-with-cpu=timing
--checkpoint-dir=parsec/bodytrack/simsmall/roi-chk_4CPU -r 1 --caches
--l1i_size=64kB --l1i_assoc=2 --l1d_size=64kB --l1d_assoc=2 --l2cache
--l2_size=1MB --l2_assoc=8 --mem-size=1024MB --prog-interval=100Hz

After execution, the simulation go into the doSimLoop and never come out.
There is no exit event.
This is what I can see at the output
*2417238392000: Event_196: system.cpu3 progress event, total committed:13,
progress insts committed: 13, IPC: 06.5e-07*
*2417238392000: Event_195: system.cpu2 progress event, total committed:1,
progress insts committed: 1, IPC: 0005e-08*
*2417238392000: Event_194: system.cpu1 progress event, total committed:12,
progress insts committed: 12, IPC: 0006e-07*
*2417238392000: Event_193: system.cpu0 progress event, total committed:1,
progress insts committed: 1, IPC: 0005e-08*
*2417238392000: Event_192: system.switch_cpus3 progress event, total
committed:0, progress insts committed: 0, IPC: 00000000*
*2417238392000: Event_191: system.switch_cpus2 progress event, total
committed:0, progress insts committed: 0, IPC: 00000000*
*2417238392000: Event_190: system.switch_cpus1 progress event, total
committed:0, progress insts committed: 0, IPC: 00000000*
*2417238392000: Event_189: system.switch_cpus0 progress event, total
committed:0, progress insts committed: 0, IPC: 00000000*
*2427238392000: Event_189: system.switch_cpus0 progress event, total
committed:0, progress insts committed: 0, IPC: 00000000*
*2427238392000: Event_190: system.switch_cpus1 progress event, total
committed:0, progress insts committed: 0, IPC: 00000000*
*2427238392000: Event_191: system.switch_cpus2 progress event, total
committed:0, progress insts committed: 0, IPC: 00000000*
*2427238392000: Event_192: system.switch_cpus3 progress event, total
committed:0, progress insts committed: 0, IPC: 00000000*
*2427238392000: Event_193: system.cpu0 progress event, total committed:1,
progress insts committed: 0, IPC: 00000000*
*2427238392000: Event_194: system.cpu1 progress event, total committed:12,
progress insts committed: 0, IPC: 00000000*
*2427238392000: Event_195: system.cpu2 progress event, total committed:1,
progress insts committed: 0, IPC: 00000000*
*2427238392000: Event_196: system.cpu3 progress event, total committed:13,
progress insts committed: 0, IPC: 00000000*
*2437238392000: Event_196: system.cpu3 progress event, total committed:13,
progress insts committed: 0, IPC: 00000000*
*2437238392000: Event_195: system.cpu2 progress event, total committed:1,
progress insts committed: 0, IPC: 00000000*
*2437238392000: Event_194: system.cpu1 progress event, total committed:12,
progress insts committed: 0, IPC: 00000000*
*2437238392000: Event_193: system.cpu0 progress event, total committed:1,
progress insts committed: 0, IPC: 00000000*

There is no progress at all for the switched cpus.
I debugged using the drain flags.
As the below trace file shows, the CPU1 never gets drained out of the 4
simulated.

*      0: system.tsunami.io.rtc: Real-time clock set to Thu Jan  1 00:00:00
2009*
*2407238402000: system.ruby.l1_cntrl0.sequencer: RubyPort not drained*
*2407238402000: system.ruby.l1_cntrl2.sequencer: RubyPort not drained*
*2407238402000: system.cpu0: Requesting drain:
(0xfffffc0000319980=>0xfffffc0000319984)*
*2407238402000: system.cpu1: No need to drain.*
*2407238402000: system.cpu2: Requesting drain: (0x1200f82ec=>0x1200f82f0)*
*2407238402000: system.cpu3: No need to drain.*
*2408203163500: system.ruby.l1_cntrl2.sequencer: Drain count: 0*
*2408203163500: system.ruby.l1_cntrl2.sequencer: RubyPort done draining,
signaling drain done*
*2408203164000: system.cpu2: tryCompleteDrain: (0x1200f7ed4=>0x1200f7ed8)*
*2408203164000: system.cpu2: CPU done draining, processing drain event*
*2408203171000: system.ruby.l1_cntrl0.sequencer: Drain count: 0*
*2408203171000: system.ruby.l1_cntrl0.sequencer: RubyPort done draining,
signaling drain done*
*2408203209000: system.cpu0: tryCompleteDrain:
(0xfffffc0000319984=>0xfffffc0000319988)*
*2408203209000: system.cpu0: CPU done draining, processing drain event*
*2408203209000: system.ruby.l1_cntrl1.sequencer: RubyPort not drained*
*2408203209000: system.ruby.l1_cntrl3.sequencer: RubyPort not drained*
*2408203209000: system.cpu0: No need to drain.*
*2408203209000: system.cpu1: Requesting drain: (0x4139=>0x413d)*
*2408203209000: system.cpu2: No need to drain.*
*2408203209000: system.cpu3: Requesting drain: (0x4139=>0x413d)*
*2408203228000: system.ruby.l1_cntrl3.sequencer: Drain count: 0*
*2408203228000: system.ruby.l1_cntrl3.sequencer: RubyPort done draining,
signaling drain done*
*2408203228500: system.cpu3: tryCompleteDrain: (0x413d=>0x4141)*
*2408203228500: system.cpu3: CPU done draining, processing drain event*
*2417238392000: Event_196: system.cpu3 progress event, total committed:13,
progress insts committed: 13, IPC: 06.5e-07*
....(as above)

When I kill the run, I get the following error :
gem5.opt: build/ALPHA_MESI_CMP_directory/python/swig/drain_wrap.cc:3233:
void cleanupDrainManager(DrainManager*): Assertion
`drain_manager->getCount() == 0' failed.

To debug further, I use gdb for breakpoints at the start
of cleanupDrainManager
in build/ALPHA_MESI_CMP_directory/python/swig/drain_wrap.cc. However, the
first cleanup it calls after 2408203209000, it comes out of the
function, drain_manager->getCount() returns 0 and the cleanupDrainManager
is never called again, it goes in the same simulation loop after continuing
and never comes out

*(gdb) c*
*Continuing.*

*Breakpoint 4, cleanupDrainManager (drain_manager=0xd4e9a50)*
*    at build/ALPHA_MESI_CMP_directory/python/swig/drain_wrap.cc:3232*
*3232    assert(drain_manager);*
*(gdb) s*
*3233    assert(drain_manager->getCount() == 0);*
*(gdb) p drain_manager->getCount()*
*$1 = 0*
*(gdb) s*
*DrainManager::getCount (this=0xd4e9a50)*
*    at build/ALPHA_MESI_CMP_directory/sim/drain.hh:78*
*78    unsigned int getCount() const { return _count; }*
*(gdb) *
*cleanupDrainManager (drain_manager=0xd4e9a50)*
*    at build/ALPHA_MESI_CMP_directory/python/swig/drain_wrap.cc:3234*
*3234    delete drain_manager;*
*(gdb) *
*DrainManager::~DrainManager (this=0xd4e9a50, __in_chrg=<optimized out>)*
*    at build/ALPHA_MESI_CMP_directory/sim/drain.cc:50*
*50 }*
*(gdb) n*
*cleanupDrainManager (drain_manager=0xd4e9a50)*
*    at build/ALPHA_MESI_CMP_directory/python/swig/drain_wrap.cc:3235*
*3235 }*
*(gdb) s*
*_wrap_cleanupDrainManager (args=0x3804190)*
*    at build/ALPHA_MESI_CMP_directory/python/swig/drain_wrap.cc:3524*
*3524  resultobj = SWIG_Py_Void();*
*(gdb) n*
*3525  return resultobj;*
*(gdb) *
*3528 }*
*(gdb) *
*0x00007ffff775c5d5 in PyEval_EvalFrameEx () from
/usr/lib/libpython2.7.so.1.0*

*(gdb) c*
*Continuing.*
*info: Entering event queue @ 2408203209000.  Starting simulation...*
(its now in the loop and does not come out)

the -maxtick flag doesnt work to stop the simulation.

I am running PARSEC benchmarks. This does not happen for every benchmark.
For eg Bodytrack shows this behavior however Canneal runs fine.



*2. Assertion Fails*

--debug-flags=Drain --trace-file=trace.out -d m5out/OutputDIR
configs/example/ruby_fs.py --ruby -n 4 --repeat-switch=20
--repeat-time-simple=10000000000 --repeat-time-detailed=1000000000
--cpu-type=detailed --restore-with-cpu=timing
--checkpoint-dir=parsec/swaptions/simmedium/multi-chk -r 6 --caches
--l1i_size=64kB --l1i_assoc=2 --l1d-cachebank --l1d-data-write-latency=1
--l1d_size=64kB --l1d_assoc=2 --l2cache --l2_size=4MB --l2_assoc=8
--mem-size=1024MB --prog-interval=10Hz

Here I get the following assertion

*gem5.debug: build/ALPHA_MESI_CMP_directory/cpu/o3/inst_queue_impl.hh:443:
void InstructionQueue<Impl>::drainSanityCheck() const [with Impl =
O3CPUImpl]: Assertion `instsToExecute.empty()' failed *

using gdb it indeed shows that the instsToExecute is not empty just before
the assertion occurs

*(gdb) plist instsToExecute DynInstPtr*
*elem[0]: $129 = {*
*  data = 0x4cf0e00*
*}*
*elem[1]: $130 = {*
*  data = 0x4d24100*
*}*
*List size = 2 *
*(gdb) p name()*
*$131 = "system.repeat_switch_cpus1.iq
<http://system.repeat_switch_cpus1.iq>"*
*(gdb) backtrace*
*#0  InstructionQueue<O3CPUImpl>::drainSanityCheck (this=0x354e660)*
*    at build/ALPHA_MESI_CMP_directory/cpu/o3/inst_queue_impl.hh:443*
*#1  0x000000000097ea7f in DefaultIEW<O3CPUImpl>::drainSanityCheck (*
*    this=0x354e2c0) at
build/ALPHA_MESI_CMP_directory/cpu/o3/iew_impl.hh:396*
*#2  0x000000000091592f in FullO3CPU<O3CPUImpl>::drainSanityCheck (*
*    this=0x354d410) at build/ALPHA_MESI_CMP_directory/cpu/o3/cpu.cc:1187*
*#3  0x0000000000929cd4 in FullO3CPU<O3CPUImpl>::drain (this=0x354d410, *
*    drain_manager=0x2eb00f0)*
*    at build/ALPHA_MESI_CMP_directory/cpu/o3/cpu.cc:1157*
*#4  0x0000000000b912bd in _wrap_Drainable_drain (args=0x28aaa70)*
*    at build/ALPHA_MESI_CMP_directory/python/swig/drain_wrap.cc:3142*
*#5  0x0000003958ed55c6 in PyEval_EvalFrameEx ()*
*   from /usr/lib64/libpython2.6.so.1.0*
*#6  0x0000003958ed7657 in PyEval_EvalCodeEx ()*
*   from /usr/lib64/libpython2.6.so.1.0*
*#7  0x0000003958ed5aa4 in PyEval_EvalFrameEx ()*
*   from /usr/lib64/libpython2.6.so.1.0*
*#8  0x0000003958e60917 in ?? () from /usr/lib64/libpython2.6.so.1.0*
*#9  0x0000003958e42b4b in PyIter_Next () from
/usr/lib64/libpython2.6.so.1.0*
*#10 0x0000003958ecb036 in ?? () from /usr/lib64/libpython2.6.so.1.0*
*#11 0x0000003958ed59e4 in PyEval_EvalFrameEx ()*
*   from /usr/lib64/libpython2.6.so.1.0*
*#12 0x0000003958ed7657 in PyEval_EvalCodeEx ()*
....

In cpu/o3/iew_impl.hh, in executeInsts(), the instructions for execution
are taken from instsToExecute list.
However, the number of instructions it wants to execute is taken from issue
stage
*fromIssue->size (cpu/o3/iew_impl.hh:1221) *instead of taking into
consideration the number of instructions waiting to be Executed. and
therefore the instsToExecute list does not get empty and throws the
assertion.

in cpu/o3/inst_queue_impl.hh there are three push backs to instsToExecute
list and only one pop, when it wants to get the instruction for execution.


*3. GDB *
Using gdb, once the run gets to any or *_wrap.cc files or in python/swig
directory, the runs do not come back where it left or where other functions
are called unless there is breakpoint somewhere in the code.
it shows the following

*Single stepping until exit from function PyEval_EvalFrameEx,*
*which has no line number information.*
*0x00007ffff771c6b5 in PyEval_EvalCodeEx () from
/usr/lib/libpython2.7.so.1.0*
(gdb)
Single stepping until exit from function PyEval_EvalCodeEx,
which has no line number information.
0x00007ffff775c650 in PyEval_EvalFrameEx () from
/usr/lib/libpython2.7.so.1.0

The simulation continues in the background and there is no way to get
inside the simulator code.

Could anyone explain this behavior? or guide me to some useful documents?


Thanks,
-Pushkar

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Issues while Draining the CPUs

Reply via email to