Hi ,For the second issue, After drain, I am simply checking if all the instructions in InstToExecute list is squashed and if they are, I am clearing the list. If not I am exiting simulation. (Since cpu is drained and InstList is empty, all entries in InstToExecute list must be squashed).
For the first issue, I am not familiar with how ruby handles drain. I am sure the seniors in this group will know the answer to your questions. Thanks Srini On 04/15/14, pushkar nandkar wrote: > Hi, > Srini, Thanks for that. Do you have any workaround for the second issue? > > > About the first problem, I debugged a bit further. However I not able to > think what can done next to debug further. > > > I checked into the source code, activated some > flags(SimpleCPU,Activity,O3CPU,Quiesce,Drain). > I have attached tracefiles for bodytrack and Canneal(runs well). The > difference can be clearly seen. > > > > In the tracefile I can see that, suspended processor is awoke for CPU1 and > CPU3 (TimingSimpleCPU::wakeup()). Meanwhile CPU2 and CPU0 are being drained. > There are no wakeup call for CPU2 and CPU0. > These happen for both the benchmarks and therefore I dont think it is the > issue with wakeup/quiesce. > > > > > Whenever a drain is called, it schedules a fetch event > TimingSimpleCPU::drain() > During a fetch event, it will sendfetch() which calls > icachePort.sendTimingReq(). > TimingSimpleCPU::IcachePort::recvTimingResp() fetches the packet using > TimingSimpleCPU::completeIfetch() in which it calls advanceInst() which does > tryCompleteDrain(). > > In this function it is checked whether the drain manager is null or not and > drains the CPU if it is not null. > > > For Canneal, this goes on pretty well(see tracefile) > > > However for Bodytrack, drain starts for CPU1 and CPU3 pretty late compared to > canneal(see tracefile). There are many fetch events before the draining > actually starts. > When finally TimingSimpleCPU::drain() for CPU1 is called, the fetch event is > not scheduled since _status = BaseSimpleCPU::DcacheWaitResponse (see > TimingSimpleCPU::drain()) > > > Same goes for CPU3. However, what I could find is the events > L1Cache_Controller::wakeup and PacketQueue::processSendEvent() drained > CPU3_L1 Controller and CPU3 respectively. However, the same does not occur > for CPU1. > Hence CPU1 never gets drained. > > > The simulator gets stuck in the doSimLoop and never comes out. It keeps > spinning in the loop with no progress. > > > Any help for debugging further will be great! > > > Thanks > -Pushkar > > > > On Mon, Apr 14, 2014 at 7:27 PM, Srinivasan Narayanamoorthy > <[email protected](javascript:main.compose()> wrote: > > > Hi > > > > 1) For the first problem, I suspect that the threadcontext is suspended and > > for some reason it never wakes up. You can look for any quiesce() > > instructions that is not accompanied by a respective wakeup(). > > > > > > 2) The basic problem here is that a drain is signalled when a pipelined op > > whose issue latency > 1 is in the execute pipeline. (for example a > > multiply). When the corresponding FUCompletion is processed, the > > InstsToExecute list is populated and hence drainSanity check fails. > > I am currently checking if all the instructions in the list are squashed > > after drain is signalled and clearing the list. > > > > > > Thanks > > Srini > > > > On 04/14/14, pushkar nandkar > > wrote: > > > Hi All, > > > > > > There are three issues I am facing right now and I could not figure out a > > > solution/workaround for that. May be with your help I can. > > > > > > > > > 1. CPU do not get drained. > > > Command line : build/ALPHA_MESI_CMP_directory/gem5.opt --debug-flag=Drain > > > --debug-file=trace.out -d m5out/OutPutDir configs/example/ruby_fs.py -n 4 > > > --cpu-type=detailed --restore-with-cpu=timing > > > --checkpoint-dir=parsec/bodytrack/simsmall/roi-chk_4CPU -r 1 --caches > > > --l1i_size=64kB --l1i_assoc=2 --l1d_size=64kB --l1d_assoc=2 --l2cache > > > --l2_size=1MB --l2_assoc=8 --mem-size=1024MB --prog-interval=100Hz > > > > > > > > > After execution, the simulation go into the doSimLoop and never come out. > > > There is no exit event. > > > This is what I can see at the output > > > 2417238392000: Event_196: system.cpu3 progress event, total committed:13, > > > progress insts committed: 13, IPC: 06.5e-07 > > > 2417238392000: Event_195: system.cpu2 progress event, total committed:1, > > > progress insts committed: 1, IPC: 0005e-08 > > > 2417238392000: Event_194: system.cpu1 progress event, total committed:12, > > > progress insts committed: 12, IPC: 0006e-07 > > > 2417238392000: Event_193: system.cpu0 progress event, total committed:1, > > > progress insts committed: 1, IPC: 0005e-08 > > > 2417238392000: Event_192: system.switch_cpus3 progress event, total > > > committed:0, progress insts committed: 0, IPC: 00000000 > > > 2417238392000: Event_191: system.switch_cpus2 progress event, total > > > committed:0, progress insts committed: 0, IPC: 00000000 > > > 2417238392000: Event_190: system.switch_cpus1 progress event, total > > > committed:0, progress insts committed: 0, IPC: 00000000 > > > 2417238392000: Event_189: system.switch_cpus0 progress event, total > > > committed:0, progress insts committed: 0, IPC: 00000000 > > > 2427238392000: Event_189: system.switch_cpus0 progress event, total > > > committed:0, progress insts committed: 0, IPC: 00000000 > > > 2427238392000: Event_190: system.switch_cpus1 progress event, total > > > committed:0, progress insts committed: 0, IPC: 00000000 > > > 2427238392000: Event_191: system.switch_cpus2 progress event, total > > > committed:0, progress insts committed: 0, IPC: 00000000 > > > 2427238392000: Event_192: system.switch_cpus3 progress event, total > > > committed:0, progress insts committed: 0, IPC: 00000000 > > > 2427238392000: Event_193: system.cpu0 progress event, total committed:1, > > > progress insts committed: 0, IPC: 00000000 > > > 2427238392000: Event_194: system.cpu1 progress event, total committed:12, > > > progress insts committed: 0, IPC: 00000000 > > > 2427238392000: Event_195: system.cpu2 progress event, total committed:1, > > > progress insts committed: 0, IPC: 00000000 > > > 2427238392000: Event_196: system.cpu3 progress event, total committed:13, > > > progress insts committed: 0, IPC: 00000000 > > > 2437238392000: Event_196: system.cpu3 progress event, total committed:13, > > > progress insts committed: 0, IPC: 00000000 > > > 2437238392000: Event_195: system.cpu2 progress event, total committed:1, > > > progress insts committed: 0, IPC: 00000000 > > > 2437238392000: Event_194: system.cpu1 progress event, total committed:12, > > > progress insts committed: 0, IPC: 00000000 > > > 2437238392000: Event_193: system.cpu0 progress event, total committed:1, > > > progress insts committed: 0, IPC: 00000000 > > > > > > > > > > > > There is no progress at all for the switched cpus. > > > I debugged using the drain flags. > > > As the below trace file shows, the CPU1 never gets drained out of the 4 > > > simulated. > > > > > > > > > 0: system.tsunami.io.rtc: Real-time clock set to Thu Jan 1 00:00:00 2009 > > > 2407238402000: system.ruby.l1_cntrl0.sequencer: RubyPort not drained > > > 2407238402000: system.ruby.l1_cntrl2.sequencer: RubyPort not drained > > > 2407238402000: system.cpu0: Requesting drain: > > > (0xfffffc0000319980=>0xfffffc0000319984) > > > 2407238402000: system.cpu1: No need to drain. > > > 2407238402000: system.cpu2: Requesting drain: (0x1200f82ec=>0x1200f82f0) > > > 2407238402000: system.cpu3: No need to drain. > > > 2408203163500: system.ruby.l1_cntrl2.sequencer: Drain count: 0 > > > 2408203163500: system.ruby.l1_cntrl2.sequencer: RubyPort done draining, > > > signaling drain done > > > 2408203164000: system.cpu2: tryCompleteDrain: (0x1200f7ed4=>0x1200f7ed8) > > > 2408203164000: system.cpu2: CPU done draining, processing drain event > > > 2408203171000: system.ruby.l1_cntrl0.sequencer: Drain count: 0 > > > 2408203171000: system.ruby.l1_cntrl0.sequencer: RubyPort done draining, > > > signaling drain done > > > 2408203209000: system.cpu0: tryCompleteDrain: > > > (0xfffffc0000319984=>0xfffffc0000319988) > > > 2408203209000: system.cpu0: CPU done draining, processing drain event > > > 2408203209000: system.ruby.l1_cntrl1.sequencer: RubyPort not drained > > > 2408203209000: system.ruby.l1_cntrl3.sequencer: RubyPort not drained > > > 2408203209000: system.cpu0: No need to drain. > > > 2408203209000: system.cpu1: Requesting drain: (0x4139=>0x413d) > > > 2408203209000: system.cpu2: No need to drain. > > > 2408203209000: system.cpu3: Requesting drain: (0x4139=>0x413d) > > > 2408203228000: system.ruby.l1_cntrl3.sequencer: Drain count: 0 > > > 2408203228000: system.ruby.l1_cntrl3.sequencer: RubyPort done draining, > > > signaling drain done > > > 2408203228500: system.cpu3: tryCompleteDrain: (0x413d=>0x4141) > > > 2408203228500: system.cpu3: CPU done draining, processing drain event > > > 2417238392000: Event_196: system.cpu3 progress event, total committed:13, > > > progress insts committed: 13, IPC: 06.5e-07 > > > > > > ....(as above) > > > > > > > > > When I kill the run, I get the following error : > > > > > > > gem5.opt: build/ALPHA_MESI_CMP_directory/python/swig/drain_wrap.cc:3233: > > > void cleanupDrainManager(DrainManager*): Assertion > > > `drain_manager->getCount() == 0' failed. > > > > > > > > > > > > To debug further, I use gdb for breakpoints at the start of > > > cleanupDrainManager in > > > build/ALPHA_MESI_CMP_directory/python/swig/drain_wrap.cc. However, the > > > first cleanup it calls after 2408203209000, it comes out of the function, > > > drain_manager->getCount() returns 0 and the cleanupDrainManager is never > > > called again, it goes in the same simulation loop after continuing and > > > never comes out > > > > > > > > > (gdb) c > > > Continuing. > > > > > > > > > Breakpoint 4, cleanupDrainManager (drain_manager=0xd4e9a50) > > > at build/ALPHA_MESI_CMP_directory/python/swig/drain_wrap.cc:3232 > > > 3232 assert(drain_manager); > > > (gdb) s > > > 3233 assert(drain_manager->getCount() == 0); > > > (gdb) p drain_manager->getCount() > > > $1 = 0 > > > (gdb) s > > > DrainManager::getCount (this=0xd4e9a50) > > > at build/ALPHA_MESI_CMP_directory/sim/drain.hh:78 > > > 78 unsigned int getCount() const { return _count; } > > > (gdb) > > > cleanupDrainManager (drain_manager=0xd4e9a50) > > > at build/ALPHA_MESI_CMP_directory/python/swig/drain_wrap.cc:3234 > > > 3234 delete drain_manager; > > > (gdb) > > > DrainManager::~DrainManager (this=0xd4e9a50, __in_chrg=<optimized out>) > > > at build/ALPHA_MESI_CMP_directory/sim/drain.cc:50 > > > 50 } > > > (gdb) n > > > cleanupDrainManager (drain_manager=0xd4e9a50) > > > at build/ALPHA_MESI_CMP_directory/python/swig/drain_wrap.cc:3235 > > > 3235 } > > > (gdb) s > > > _wrap_cleanupDrainManager (args=0x3804190) > > > at build/ALPHA_MESI_CMP_directory/python/swig/drain_wrap.cc:3524 > > > 3524 resultobj = SWIG_Py_Void(); > > > (gdb) n > > > 3525 return resultobj; > > > (gdb) > > > 3528 } > > > (gdb) > > > 0x00007ffff775c5d5 in PyEval_EvalFrameEx () from > > > /usr/lib/libpython2.7.so.1.0 > > > > > > > > > > > > (gdb) c > > > Continuing. > > > info: Entering event queue @ 2408203209000. Starting simulation... > > > > > > (its now in the loop and does not come out) > > > > > > > > > the -maxtick flag doesnt work to stop the simulation. > > > > > > > > > I am running PARSEC benchmarks. This does not happen for every benchmark. > > > For eg Bodytrack shows this behavior however Canneal runs fine. > > > > > > > > > > > > > > > > > > > > > 2. Assertion Fails > > > > > > > > > --debug-flags=Drain --trace-file=trace.out -d m5out/OutputDIR > > > configs/example/ruby_fs.py --ruby -n 4 --repeat-switch=20 > > > --repeat-time-simple=10000000000 --repeat-time-detailed=1000000000 > > > --cpu-type=detailed --restore-with-cpu=timing > > > --checkpoint-dir=parsec/swaptions/simmedium/multi-chk -r 6 --caches > > > --l1i_size=64kB --l1i_assoc=2 --l1d-cachebank --l1d-data-write-latency=1 > > > --l1d_size=64kB --l1d_assoc=2 --l2cache --l2_size=4MB --l2_assoc=8 > > > --mem-size=1024MB --prog-interval=10Hz > > > > > > > > > Here I get the following assertion > > > > > > > gem5.debug: build/ALPHA_MESI_CMP_directory/cpu/o3/inst_queue_impl.hh:443: > > > void InstructionQueue<Impl>::drainSanityCheck() const [with Impl = > > > O3CPUImpl]: Assertion `instsToExecute.empty()' failed > > > > > > > > > > > > using gdb it indeed shows that the instsToExecute is not empty just > > > before the assertion occurs > > > > > > > > > (gdb) plist instsToExecute DynInstPtr > > > elem[0]: $129 = { > > > data = 0x4cf0e00 > > > } > > > elem[1]: $130 = { > > > data = 0x4d24100 > > > } > > > List size = 2 > > > (gdb) p name() > > > > > $131 = > > > "system.repeat_switch_cpus1.iq(http://system.repeat_switch_cpus1.iq)(http://system.repeat_switch_cpus1.iq)" > > > (gdb) backtrace > > > #0 InstructionQueue<O3CPUImpl>::drainSanityCheck (this=0x354e660) > > > at build/ALPHA_MESI_CMP_directory/cpu/o3/inst_queue_impl.hh:443 > > > #1 0x000000000097ea7f in DefaultIEW<O3CPUImpl>::drainSanityCheck ( > > > this=0x354e2c0) at build/ALPHA_MESI_CMP_directory/cpu/o3/iew_impl.hh:396 > > > #2 0x000000000091592f in FullO3CPU<O3CPUImpl>::drainSanityCheck ( > > > this=0x354d410) at build/ALPHA_MESI_CMP_directory/cpu/o3/cpu.cc:1187 > > > #3 0x0000000000929cd4 in FullO3CPU<O3CPUImpl>::drain (this=0x354d410, > > > drain_manager=0x2eb00f0) > > > at build/ALPHA_MESI_CMP_directory/cpu/o3/cpu.cc:1157 > > > #4 0x0000000000b912bd in _wrap_Drainable_drain (args=0x28aaa70) > > > at build/ALPHA_MESI_CMP_directory/python/swig/drain_wrap.cc:3142 > > > #5 0x0000003958ed55c6 in PyEval_EvalFrameEx () > > > from /usr/lib64/libpython2.6.so.1.0 > > > #6 0x0000003958ed7657 in PyEval_EvalCodeEx () > > > from /usr/lib64/libpython2.6.so.1.0 > > > #7 0x0000003958ed5aa4 in PyEval_EvalFrameEx () > > > from /usr/lib64/libpython2.6.so.1.0 > > > #8 0x0000003958e60917 in ?? () from /usr/lib64/libpython2.6.so.1.0 > > > #9 0x0000003958e42b4b in PyIter_Next () from > > > /usr/lib64/libpython2.6.so.1.0 > > > #10 0x0000003958ecb036 in ?? () from /usr/lib64/libpython2.6.so.1.0 > > > #11 0x0000003958ed59e4 in PyEval_EvalFrameEx () > > > from /usr/lib64/libpython2.6.so.1.0 > > > #12 0x0000003958ed7657 in PyEval_EvalCodeEx () > > > > > > .... > > > > > > > > > In cpu/o3/iew_impl.hh, in executeInsts(), the instructions for execution > > > are taken from instsToExecute list. > > > However, the number of instructions it wants to execute is taken from > > > issue stage > > > fromIssue->size (cpu/o3/iew_impl.hh:1221) instead of taking into > > > consideration the number of instructions waiting to be Executed. and > > > therefore the instsToExecute list does not get empty and throws the > > > assertion. > > > > > > > > > > > > in cpu/o3/inst_queue_impl.hh there are three push backs to instsToExecute > > > list and only one pop, when it wants to get the instruction for execution. > > > > > > > > > > > > > > > > > > 3. GDB > > > Using gdb, once the run gets to any or *_wrap.cc files or in python/swig > > > directory, the runs do not come back where it left or where other > > > functions are called unless there is breakpoint somewhere in the code. > > > it shows the following > > > > > > > > > Single stepping until exit from function PyEval_EvalFrameEx, > > > which has no line number information. > > > 0x00007ffff771c6b5 in PyEval_EvalCodeEx () from > > > /usr/lib/libpython2.7.so.1.0 > > > > > > (gdb) > > > Single stepping until exit from function PyEval_EvalCodeEx, > > > which has no line number information. > > > 0x00007ffff775c650 in PyEval_EvalFrameEx () from > > > /usr/lib/libpython2.7.so.1.0 > > > > > > > > > The simulation continues in the background and there is no way to get > > > inside the simulator code. > > > > > > > > > Could anyone explain this behavior? or guide me to some useful documents? > > > > > > > > > > > > > > > Thanks, > > > > > > -Pushkar > > > > > > _______________________________________________ > > gem5-users mailing list > > [email protected](javascript:main.compose() > > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > _______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
