Does this problem really have anything to do with tracing, or is it just more apparent with it?
On Sat, Apr 4, 2009 at 1:49 PM, Gabe Black <[email protected]> wrote: > Oooooooooooh. I see what's broken. This is a result of my changes to > allow delaying translation. What happens is that Stl_c goes into > initiateAcc. That function calls write on the CPU which calls into the > TLB which calls the translation callback which recognizes a failed store > conditional which completes the instruction execution with completeAcc > and cleans up. The call stack then collapses back to the initiateAcc > which is still waiting to finish and which then tries to call a member > function on traceData which was deleted during the cleanup. The problem > here is not fundamentally complicated, but the mechanisms involved are. > One solution would be to record the fact that we're still in > initiateAcc, and if we are wait for the call stack to collapse back down > to initiateAcc's caller before calling into completeAcc. That matches > the semantics an instruction would expect more, I think, where the > initiateAcc/completeAcc pair are called sequentially. > > One other concern this raises is that the code in the simple timing CPU > is not very simple. One thing that would help would be to try to > relocate some of the special cases, like failed store conditionals or > memory mapped registers, into different bodies of code or at least out > of the midst of everything else going on. I haven't thought about this > in any depth, but I'll try to put together a flow chart sort of thing to > explain what happens to memory instructions as they execute. That would > be good for the sake of documentation and also so we have something > concrete to talk about. > > Gabe > > Gabe Black wrote: >> The segfault for me happens in malloc called by the new operator in >> exetrace.hh on line 84. That says to me that the most likely culprit is >> heap corruption which will be very obnoxious to track down. I've started >> up a run of valgrind just in case it can catch something bad happening >> sometime in the next n hours. >> >> Gabe >> >> Gabe Black wrote: >> >>> Oh wow. It did happen eventually. I'll see if I can figure out what's >>> going on. >>> >>> Gabe >>> >>> Gabe Black wrote: >>> >>> >>>> I tried that command line and I haven't seen any segfault yet. I'll let >>>> it run and see if anything happens. What version of the code are you using? >>>> >>>> Gabe >>>> >>>> Geoffrey Blake wrote: >>>> >>>> >>>> >>>>> I’ve added a couple edits, but nothing major, ie: added statistics to >>>>> the bus model, and some extra latency randomization to cache misses to >>>>> get better averages of parallel code runs. None of this is tied to >>>>> the trace-flags mechanism that I can determine. >>>>> >>>>> >>>>> >>>>> I did run the code through valgrind, but ridiculously enough, the >>>>> segfault disappears. I’ll keep digging in my spare time. >>>>> >>>>> >>>>> >>>>> The “Exec” trace flags work fine (billions of instructions, no >>>>> problems) with an old version of m5 that is somewhere between beta4 >>>>> and beta5 of the stable releases. Now I can trace maybe a few thousand >>>>> instructions before M5 seg faults. >>>>> >>>>> >>>>> >>>>> Here is a stripped command line that does expose the bug with the >>>>> least number of variables to consider in case someone out there wants >>>>> to try and duplicate the segfaults I’m seeing (it could be a product >>>>> of my build setup, so I’d appreciate it if someone could verify >>>>> independently): >>>>> >>>>> % m5.opt –trace-flags=”ExecEnable” fs.py –b MutexTest –t –n 1 > /dev/null >>>>> >>>>> >>>>> >>>>> Geoff >>>>> >>>>> >>>>> >>>>> *From:* [email protected] [mailto:[email protected]] *On >>>>> Behalf Of *Korey Sewell >>>>> *Sent:* Friday, April 03, 2009 9:56 AM >>>>> *To:* M5 Developer List >>>>> *Subject:* Re: [m5-dev] Memory corruption in m5 dev repository when >>>>> using --trace-flags="ExecEnable" >>>>> >>>>> >>>>> >>>>> I would echo Gabe sentiments. I've been suspicious of the trace-flags >>>>> causing memory corruption for awhile now, but every time I dig into it >>>>> there's some small error that I'm propagating through that finally >>>>> surfaces. >>>>> >>>>> In the big picture, I suspect that the trace-flags just exacerbate any >>>>> kind of memory-corruption issues since you are accessing things at >>>>> such a heavy-rate. >>>>> >>>>> In terms of debugging, is there any code that you edited that is >>>>> tagged when you use "ExecEnable" rather than just "Exec"? >>>>> >>>>> Also, if you can turn valgrind on for maybe the 1st thousand/million >>>>> cycles with ExecEnable you'll probably find something. >>>>> >>>>> On Thu, Apr 2, 2009 at 7:28 PM, Gabriel Michael Black >>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>> >>>>> Does this happen when you start tracing sooner? I'd suggest valgrind, >>>>> especially if you can make the segfault happen quickly. If you wait >>>>> for your simulation to get to 1400000000000 ticks in valgrind, you may >>>>> die before you see the result. There's a suppression file in util >>>>> which should cut down on the noise. >>>>> >>>>> Gabe >>>>> >>>>> >>>>> Quoting Geoffrey Blake <[email protected] <mailto:[email protected]>>: >>>>> >>>>> >>>>> >>>>> >>>>>> I stumbled upon what appears to be a memory corruption bug in the >>>>>> >>>>>> >>>>>> >>>>> current M5 >>>>> >>>>> >>>>> >>>>>> repository. If on the command line I enter: >>>>>> >>>>>> % ./build/ALPHA_FS/m5.opt -trace-flags="ExecEnable" >>>>>> -trace-start=1400000000000 fs.py -b <benchmark> -t -n <cpus> <more >>>>>> parameters>. The simulator will error with a segmentation fault or >>>>>> occasionally an assert not long after starting to trace instructions. >>>>>> >>>>>> >>>>>> >>>>>> I have run this through gdb in with m5.debug and see the same >>>>>> >>>>>> >>>>>> >>>>> errors, the >>>>> >>>>> >>>>> >>>>>> problem is the stack trace showing the cause of the seg fault or assert >>>>>> changes depending on the inputs to the simulator. So, I have not >>>>>> >>>>>> >>>>>> >>>>> been able >>>>> >>>>> >>>>> >>>>>> to pin point this bug which appears to be a subtle memory corruption >>>>>> somewhere in the code. This error does not happen for other trace >>>>>> >>>>>> >>>>>> >>>>> flags such >>>>> >>>>> >>>>> >>>>>> as the "Cache" trace flag. It appears linked solely to the instruction >>>>>> tracing mechanism. Has anybody else seen this bug? >>>>>> >>>>>> >>>>>> >>>>>> I'm using an up to date repository I pulled from m5sim.org >>>>>> >>>>>> >>>>>> >>>>> <http://m5sim.org> this morning. >>>>> >>>>> >>>>> >>>>>> Thanks, >>>>>> Geoff >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> m5-dev mailing list >>>>> [email protected] <mailto:[email protected]> >>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ---------- >>>>> Korey L Sewell >>>>> Graduate Student - PhD Candidate >>>>> Computer Science & Engineering >>>>> University of Michigan >>>>> >>>>> No virus found in this incoming message. >>>>> Checked by AVG - www.avg.com >>>>> Version: 8.5.285 / Virus Database: 270.11.40/2039 - Release Date: >>>>> 04/03/09 06:19:00 >>>>> >>>>> ------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> m5-dev mailing list >>>>> [email protected] >>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>> >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> m5-dev mailing list >>>> [email protected] >>>> http://m5sim.org/mailman/listinfo/m5-dev >>>> >>>> >>>> >>> _______________________________________________ >>> m5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >>> >> >> _______________________________________________ >> m5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/m5-dev >> > > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev > _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
