Hi Rio, It looks like you're setting a maximum number of instructions on a thread to execute and the problem occurs when the comInstEventQueue is being checked to see if it has reached that number of instructions. My guess is that the event is created, put on the queue, deleted, but some how never removed from the queue between the fast forwarding and the warmup. Could you run valgrind with --track-origns=yes and see if it provides any more detail?
Thanks, Ali On Sep 28, 2012, at 7:18 PM, Rio Xiangyu Dong wrote: > Hi there, > > > > I use gem5 run large-scale batch of simulations. I start to see > intermittent segmentation fault after the SE/FS merge. The simulation > stopped by segmentation fault usually go through one or two rounds of > re-simulations. > > > > Debugging this segmentation fault is extremely hard to me. Especially it > seems to me that debugging on gdb will hide such segmentation fault. So, I > believe it's some kinds of "Invalid read" problem. Debugging the binary > using valgrind somehow validates my guess, I got 3 "Invalid read" reported > by Valgrind: > > > > My command line is: valgrind --log-file=report5 > --suppressions=valgrind-python.supp build/ARM/gem5.opt configs/run/run.py > --bench=spec.gcc --caches --cpu-type=detailed -F 1000 -W 1000 -s 1 -I 10000 > > (run.py is my customized se.py scripts which automatically loads benchmark, > just that, no other stuffs in it) > > > > ==17033== Invalid read of size 2 > > ==17033== at 0x4B9774: Flags<unsigned short>::isSet(unsigned short) const > (flags.hh:63) > > ==17033== by 0x1AD5CE2: Event::isExitEvent() const (eventq.hh:291) > > ==17033== by 0x1B28013: EventQueue::serviceOne() (eventq.cc:205) > > ==17033== by 0x14DBA72: EventQueue::serviceEvents(unsigned long) > (eventq.hh:399) > > ==17033== by 0x162E34F: BaseSimpleCPU::preExecute() (base.cc:364) > > ==17033== by 0x161D32A: AtomicSimpleCPU::tick() (atomic.cc:492) > > ==17033== by 0x1618CFF: AtomicSimpleCPU::TickEvent::process() > (atomic.cc:72) > > ==17033== by 0x1B28007: EventQueue::serviceOne() (eventq.cc:204) > > ==17033== by 0x1B74180: simulate(unsigned long) (simulate.cc:85) > > ==17033== by 0x1AD48D2: _wrap_simulate__SWIG_1 (event_wrap.cc:4605) > > ==17033== by 0x1AD49AF: _wrap_simulate (event_wrap.cc:4628) > > ==17033== by 0x4F26F9E: PyEval_EvalFrameEx (ceval.c:4060) > > > > ==17033== Invalid read of size 2 > > ==17033== at 0x4B9774: Flags<unsigned short>::isSet(unsigned short) const > (flags.hh:63) > > ==17033== by 0x1B28079: EventQueue::serviceOne() (eventq.cc:213) > > ==17033== by 0x14DBA72: EventQueue::serviceEvents(unsigned long) > (eventq.hh:399) > > ==17033== by 0x162E34F: BaseSimpleCPU::preExecute() (base.cc:364) > > ==17033== by 0x161D32A: AtomicSimpleCPU::tick() (atomic.cc:492) > > ==17033== by 0x1618CFF: AtomicSimpleCPU::TickEvent::process() > (atomic.cc:72) > > ==17033== by 0x1B28007: EventQueue::serviceOne() (eventq.cc:204) > > ==17033== by 0x1B74180: simulate(unsigned long) (simulate.cc:85) > > ==17033== by 0x1AD48D2: _wrap_simulate__SWIG_1 (event_wrap.cc:4605) > > ==17033== by 0x1AD49AF: _wrap_simulate (event_wrap.cc:4628) > > ==17033== by 0x4F26F9E: PyEval_EvalFrameEx (ceval.c:4060) > > ==17033== by 0x4F28EF2: PyEval_EvalCodeEx (ceval.c:3000) > > > > ==17033== Invalid read of size 2 > > ==17033== at 0x4B9774: Flags<unsigned short>::isSet(unsigned short) const > (flags.hh:63) > > ==17033== by 0x1B28079: EventQueue::serviceOne() (eventq.cc:213) > > ==17033== by 0x14DBA72: EventQueue::serviceEvents(unsigned long) > (eventq.hh:399) > > ==17033== by 0x14F7A14: FullO3CPU<O3CPUImpl>::instDone(short, > RefCountingPtr<BaseO3DynInst<O3CPUImpl> >&) (cpu.cc:1497) > > ==17033== by 0x14D6BA0: > DefaultCommit<O3CPUImpl>::updateComInstStats(RefCountingPtr<BaseO3DynInst<O3 > CPUImpl> >&) (commit_impl.hh:1344) > > ==17033== by 0x14D28BF: > DefaultCommit<O3CPUImpl>::commitHead(RefCountingPtr<BaseO3DynInst<O3CPUImpl> >> &, unsigned int) (commit_impl.hh:1179) > > ==17033== by 0x14CDB1D: DefaultCommit<O3CPUImpl>::commitInsts() > (commit_impl.hh:982) > > ==17033== by 0x14C7DAA: DefaultCommit<O3CPUImpl>::commit() > (commit_impl.hh:878) > > ==17033== by 0x14C4A28: DefaultCommit<O3CPUImpl>::tick() > (commit_impl.hh:660) > > ==17033== by 0x14E9AF8: FullO3CPU<O3CPUImpl>::tick() (cpu.cc:601) > > ==17033== by 0x14FC5F5: FullO3CPU<O3CPUImpl>::TickEvent::process() > (cpu.cc:139) > > ==17033== by 0x1B28007: EventQueue::serviceOne() (eventq.cc:204) > > > > It seems to me that there are some read-after-delete accesses in > serviceOne() or process(). Can anyone take a look into it? > > > > BTW, I always follow the latest update of gem5 repo, my local revision is > 9267. > > > > Thank you! > > > > Best, > > Xiangyu > > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
