I couldn't deal with it right away and then forgot about it. It's still broken to the best of my knowledge.
Gabe nathan binkert wrote: > Whatever happened with this? I just lost track. > > Nate > > >> It's broader than tracing and not caused by the tracing mechanism >> itself, but I think it will only show up with tracing. The pointer to >> the trace data will be NULL otherwise and the instruction won't >> attempt to use it. Nothing else that exists currently to my knowledge >> is in a position to be affected by this. >> >> Gabe >> >> Quoting nathan binkert <[email protected]>: >> >> >>> Does this problem really have anything to do with tracing, or is it >>> just more apparent with it? >>> >>> On Sat, Apr 4, 2009 at 1:49 PM, Gabe Black <[email protected]> wrote: >>> >>>> Oooooooooooh. I see what's broken. This is a result of my changes to >>>> allow delaying translation. What happens is that Stl_c goes into >>>> initiateAcc. That function calls write on the CPU which calls into the >>>> TLB which calls the translation callback which recognizes a failed store >>>> conditional which completes the instruction execution with completeAcc >>>> and cleans up. The call stack then collapses back to the initiateAcc >>>> which is still waiting to finish and which then tries to call a member >>>> function on traceData which was deleted during the cleanup. The problem >>>> here is not fundamentally complicated, but the mechanisms involved are. >>>> One solution would be to record the fact that we're still in >>>> initiateAcc, and if we are wait for the call stack to collapse back down >>>> to initiateAcc's caller before calling into completeAcc. That matches >>>> the semantics an instruction would expect more, I think, where the >>>> initiateAcc/completeAcc pair are called sequentially. >>>> >>>> One other concern this raises is that the code in the simple timing CPU >>>> is not very simple. One thing that would help would be to try to >>>> relocate some of the special cases, like failed store conditionals or >>>> memory mapped registers, into different bodies of code or at least out >>>> of the midst of everything else going on. I haven't thought about this >>>> in any depth, but I'll try to put together a flow chart sort of thing to >>>> explain what happens to memory instructions as they execute. That would >>>> be good for the sake of documentation and also so we have something >>>> concrete to talk about. >>>> >>>> Gabe >>>> >>>> Gabe Black wrote: >>>> >>>>> The segfault for me happens in malloc called by the new operator in >>>>> exetrace.hh on line 84. That says to me that the most likely culprit is >>>>> heap corruption which will be very obnoxious to track down. I've started >>>>> up a run of valgrind just in case it can catch something bad happening >>>>> sometime in the next n hours. >>>>> >>>>> Gabe >>>>> >>>>> Gabe Black wrote: >>>>> >>>>> >>>>>> Oh wow. It did happen eventually. I'll see if I can figure out what's >>>>>> going on. >>>>>> >>>>>> Gabe >>>>>> >>>>>> Gabe Black wrote: >>>>>> >>>>>> >>>>>> >>>>>>> I tried that command line and I haven't seen any segfault yet. I'll let >>>>>>> it run and see if anything happens. What version of the code are >>>>>>> you using? >>>>>>> >>>>>>> Gabe >>>>>>> >>>>>>> Geoffrey Blake wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> I’ve added a couple edits, but nothing major, ie: added statistics to >>>>>>>> the bus model, and some extra latency randomization to cache misses to >>>>>>>> get better averages of parallel code runs. None of this is tied to >>>>>>>> the trace-flags mechanism that I can determine. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I did run the code through valgrind, but ridiculously enough, the >>>>>>>> segfault disappears. I’ll keep digging in my spare time. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> The “Exec” trace flags work fine (billions of instructions, no >>>>>>>> problems) with an old version of m5 that is somewhere between beta4 >>>>>>>> and beta5 of the stable releases. Now I can trace maybe a few thousand >>>>>>>> instructions before M5 seg faults. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Here is a stripped command line that does expose the bug with the >>>>>>>> least number of variables to consider in case someone out there wants >>>>>>>> to try and duplicate the segfaults I’m seeing (it could be a product >>>>>>>> of my build setup, so I’d appreciate it if someone could verify >>>>>>>> independently): >>>>>>>> >>>>>>>> % m5.opt –trace-flags=”ExecEnable” fs.py –b MutexTest –t –n 1 > >>>>>>>> /dev/null >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Geoff >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *From:* [email protected] [mailto:[email protected]] *On >>>>>>>> Behalf Of *Korey Sewell >>>>>>>> *Sent:* Friday, April 03, 2009 9:56 AM >>>>>>>> *To:* M5 Developer List >>>>>>>> *Subject:* Re: [m5-dev] Memory corruption in m5 dev repository when >>>>>>>> using --trace-flags="ExecEnable" >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I would echo Gabe sentiments. I've been suspicious of the trace-flags >>>>>>>> causing memory corruption for awhile now, but every time I dig into it >>>>>>>> there's some small error that I'm propagating through that finally >>>>>>>> surfaces. >>>>>>>> >>>>>>>> In the big picture, I suspect that the trace-flags just exacerbate any >>>>>>>> kind of memory-corruption issues since you are accessing things at >>>>>>>> such a heavy-rate. >>>>>>>> >>>>>>>> In terms of debugging, is there any code that you edited that is >>>>>>>> tagged when you use "ExecEnable" rather than just "Exec"? >>>>>>>> >>>>>>>> Also, if you can turn valgrind on for maybe the 1st thousand/million >>>>>>>> cycles with ExecEnable you'll probably find something. >>>>>>>> >>>>>>>> On Thu, Apr 2, 2009 at 7:28 PM, Gabriel Michael Black >>>>>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>>>>> >>>>>>>> Does this happen when you start tracing sooner? I'd suggest valgrind, >>>>>>>> especially if you can make the segfault happen quickly. If you wait >>>>>>>> for your simulation to get to 1400000000000 ticks in valgrind, you may >>>>>>>> die before you see the result. There's a suppression file in util >>>>>>>> which should cut down on the noise. >>>>>>>> >>>>>>>> Gabe >>>>>>>> >>>>>>>> >>>>>>>> Quoting Geoffrey Blake <[email protected] <mailto:[email protected]>>: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> I stumbled upon what appears to be a memory corruption bug in the >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> current M5 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> repository. If on the command line I enter: >>>>>>>>> >>>>>>>>> % ./build/ALPHA_FS/m5.opt -trace-flags="ExecEnable" >>>>>>>>> -trace-start=1400000000000 fs.py -b <benchmark> -t -n <cpus> <more >>>>>>>>> parameters>. The simulator will error with a segmentation fault or >>>>>>>>> occasionally an assert not long after starting to trace instructions. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I have run this through gdb in with m5.debug and see the same >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> errors, the >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> problem is the stack trace showing the cause of the seg fault or >>>>>>>>> assert >>>>>>>>> changes depending on the inputs to the simulator. So, I have not >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> been able >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> to pin point this bug which appears to be a subtle memory corruption >>>>>>>>> somewhere in the code. This error does not happen for other trace >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> flags such >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> as the "Cache" trace flag. It appears linked solely to the instruction >>>>>>>>> tracing mechanism. Has anybody else seen this bug? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I'm using an up to date repository I pulled from m5sim.org >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> <http://m5sim.org> this morning. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Geoff >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> m5-dev mailing list >>>>>>>> [email protected] <mailto:[email protected]> >>>>>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ---------- >>>>>>>> Korey L Sewell >>>>>>>> Graduate Student - PhD Candidate >>>>>>>> Computer Science & Engineering >>>>>>>> University of Michigan >>>>>>>> >>>>>>>> No virus found in this incoming message. >>>>>>>> Checked by AVG - www.avg.com >>>>>>>> Version: 8.5.285 / Virus Database: 270.11.40/2039 - Release Date: >>>>>>>> 04/03/09 06:19:00 >>>>>>>> >>>>>>>> ------------------------------------------------------------------------ >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> m5-dev mailing list >>>>>>>> [email protected] >>>>>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> m5-dev mailing list >>>>>>> [email protected] >>>>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> m5-dev mailing list >>>>>> [email protected] >>>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> m5-dev mailing list >>>>> [email protected] >>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>> >>>>> >>>> _______________________________________________ >>>> m5-dev mailing list >>>> [email protected] >>>> http://m5sim.org/mailman/listinfo/m5-dev >>>> >>>> >>> _______________________________________________ >>> m5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >>> >> _______________________________________________ >> m5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/m5-dev >> >> > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev > _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
