Hi there,
I use gem5 run large-scale batch of simulations. I start to see intermittent segmentation fault after the SE/FS merge. The simulation stopped by segmentation fault usually go through one or two rounds of re-simulations. Debugging this segmentation fault is extremely hard to me. Especially it seems to me that debugging on gdb will hide such segmentation fault. So, I believe it's some kinds of "Invalid read" problem. Debugging the binary using valgrind somehow validates my guess, I got 3 "Invalid read" reported by Valgrind: My command line is: valgrind --log-file=report5 --suppressions=valgrind-python.supp build/ARM/gem5.opt configs/run/run.py --bench=spec.gcc --caches --cpu-type=detailed -F 1000 -W 1000 -s 1 -I 10000 (run.py is my customized se.py scripts which automatically loads benchmark, just that, no other stuffs in it) ==17033== Invalid read of size 2 ==17033== at 0x4B9774: Flags<unsigned short>::isSet(unsigned short) const (flags.hh:63) ==17033== by 0x1AD5CE2: Event::isExitEvent() const (eventq.hh:291) ==17033== by 0x1B28013: EventQueue::serviceOne() (eventq.cc:205) ==17033== by 0x14DBA72: EventQueue::serviceEvents(unsigned long) (eventq.hh:399) ==17033== by 0x162E34F: BaseSimpleCPU::preExecute() (base.cc:364) ==17033== by 0x161D32A: AtomicSimpleCPU::tick() (atomic.cc:492) ==17033== by 0x1618CFF: AtomicSimpleCPU::TickEvent::process() (atomic.cc:72) ==17033== by 0x1B28007: EventQueue::serviceOne() (eventq.cc:204) ==17033== by 0x1B74180: simulate(unsigned long) (simulate.cc:85) ==17033== by 0x1AD48D2: _wrap_simulate__SWIG_1 (event_wrap.cc:4605) ==17033== by 0x1AD49AF: _wrap_simulate (event_wrap.cc:4628) ==17033== by 0x4F26F9E: PyEval_EvalFrameEx (ceval.c:4060) ==17033== Invalid read of size 2 ==17033== at 0x4B9774: Flags<unsigned short>::isSet(unsigned short) const (flags.hh:63) ==17033== by 0x1B28079: EventQueue::serviceOne() (eventq.cc:213) ==17033== by 0x14DBA72: EventQueue::serviceEvents(unsigned long) (eventq.hh:399) ==17033== by 0x162E34F: BaseSimpleCPU::preExecute() (base.cc:364) ==17033== by 0x161D32A: AtomicSimpleCPU::tick() (atomic.cc:492) ==17033== by 0x1618CFF: AtomicSimpleCPU::TickEvent::process() (atomic.cc:72) ==17033== by 0x1B28007: EventQueue::serviceOne() (eventq.cc:204) ==17033== by 0x1B74180: simulate(unsigned long) (simulate.cc:85) ==17033== by 0x1AD48D2: _wrap_simulate__SWIG_1 (event_wrap.cc:4605) ==17033== by 0x1AD49AF: _wrap_simulate (event_wrap.cc:4628) ==17033== by 0x4F26F9E: PyEval_EvalFrameEx (ceval.c:4060) ==17033== by 0x4F28EF2: PyEval_EvalCodeEx (ceval.c:3000) ==17033== Invalid read of size 2 ==17033== at 0x4B9774: Flags<unsigned short>::isSet(unsigned short) const (flags.hh:63) ==17033== by 0x1B28079: EventQueue::serviceOne() (eventq.cc:213) ==17033== by 0x14DBA72: EventQueue::serviceEvents(unsigned long) (eventq.hh:399) ==17033== by 0x14F7A14: FullO3CPU<O3CPUImpl>::instDone(short, RefCountingPtr<BaseO3DynInst<O3CPUImpl> >&) (cpu.cc:1497) ==17033== by 0x14D6BA0: DefaultCommit<O3CPUImpl>::updateComInstStats(RefCountingPtr<BaseO3DynInst<O3 CPUImpl> >&) (commit_impl.hh:1344) ==17033== by 0x14D28BF: DefaultCommit<O3CPUImpl>::commitHead(RefCountingPtr<BaseO3DynInst<O3CPUImpl> >&, unsigned int) (commit_impl.hh:1179) ==17033== by 0x14CDB1D: DefaultCommit<O3CPUImpl>::commitInsts() (commit_impl.hh:982) ==17033== by 0x14C7DAA: DefaultCommit<O3CPUImpl>::commit() (commit_impl.hh:878) ==17033== by 0x14C4A28: DefaultCommit<O3CPUImpl>::tick() (commit_impl.hh:660) ==17033== by 0x14E9AF8: FullO3CPU<O3CPUImpl>::tick() (cpu.cc:601) ==17033== by 0x14FC5F5: FullO3CPU<O3CPUImpl>::TickEvent::process() (cpu.cc:139) ==17033== by 0x1B28007: EventQueue::serviceOne() (eventq.cc:204) It seems to me that there are some read-after-delete accesses in serviceOne() or process(). Can anyone take a look into it? BTW, I always follow the latest update of gem5 repo, my local revision is 9267. Thank you! Best, Xiangyu _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
