Whatever happened with this? I just lost track. Nate
> It's broader than tracing and not caused by the tracing mechanism > itself, but I think it will only show up with tracing. The pointer to > the trace data will be NULL otherwise and the instruction won't > attempt to use it. Nothing else that exists currently to my knowledge > is in a position to be affected by this. > > Gabe > > Quoting nathan binkert <[email protected]>: > >> Does this problem really have anything to do with tracing, or is it >> just more apparent with it? >> >> On Sat, Apr 4, 2009 at 1:49 PM, Gabe Black <[email protected]> wrote: >>> Oooooooooooh. I see what's broken. This is a result of my changes to >>> allow delaying translation. What happens is that Stl_c goes into >>> initiateAcc. That function calls write on the CPU which calls into the >>> TLB which calls the translation callback which recognizes a failed store >>> conditional which completes the instruction execution with completeAcc >>> and cleans up. The call stack then collapses back to the initiateAcc >>> which is still waiting to finish and which then tries to call a member >>> function on traceData which was deleted during the cleanup. The problem >>> here is not fundamentally complicated, but the mechanisms involved are. >>> One solution would be to record the fact that we're still in >>> initiateAcc, and if we are wait for the call stack to collapse back down >>> to initiateAcc's caller before calling into completeAcc. That matches >>> the semantics an instruction would expect more, I think, where the >>> initiateAcc/completeAcc pair are called sequentially. >>> >>> One other concern this raises is that the code in the simple timing CPU >>> is not very simple. One thing that would help would be to try to >>> relocate some of the special cases, like failed store conditionals or >>> memory mapped registers, into different bodies of code or at least out >>> of the midst of everything else going on. I haven't thought about this >>> in any depth, but I'll try to put together a flow chart sort of thing to >>> explain what happens to memory instructions as they execute. That would >>> be good for the sake of documentation and also so we have something >>> concrete to talk about. >>> >>> Gabe >>> >>> Gabe Black wrote: >>>> The segfault for me happens in malloc called by the new operator in >>>> exetrace.hh on line 84. That says to me that the most likely culprit is >>>> heap corruption which will be very obnoxious to track down. I've started >>>> up a run of valgrind just in case it can catch something bad happening >>>> sometime in the next n hours. >>>> >>>> Gabe >>>> >>>> Gabe Black wrote: >>>> >>>>> Oh wow. It did happen eventually. I'll see if I can figure out what's >>>>> going on. >>>>> >>>>> Gabe >>>>> >>>>> Gabe Black wrote: >>>>> >>>>> >>>>>> I tried that command line and I haven't seen any segfault yet. I'll let >>>>>> it run and see if anything happens. What version of the code are >>>>>> you using? >>>>>> >>>>>> Gabe >>>>>> >>>>>> Geoffrey Blake wrote: >>>>>> >>>>>> >>>>>> >>>>>>> I’ve added a couple edits, but nothing major, ie: added statistics to >>>>>>> the bus model, and some extra latency randomization to cache misses to >>>>>>> get better averages of parallel code runs. None of this is tied to >>>>>>> the trace-flags mechanism that I can determine. >>>>>>> >>>>>>> >>>>>>> >>>>>>> I did run the code through valgrind, but ridiculously enough, the >>>>>>> segfault disappears. I’ll keep digging in my spare time. >>>>>>> >>>>>>> >>>>>>> >>>>>>> The “Exec” trace flags work fine (billions of instructions, no >>>>>>> problems) with an old version of m5 that is somewhere between beta4 >>>>>>> and beta5 of the stable releases. Now I can trace maybe a few thousand >>>>>>> instructions before M5 seg faults. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Here is a stripped command line that does expose the bug with the >>>>>>> least number of variables to consider in case someone out there wants >>>>>>> to try and duplicate the segfaults I’m seeing (it could be a product >>>>>>> of my build setup, so I’d appreciate it if someone could verify >>>>>>> independently): >>>>>>> >>>>>>> % m5.opt –trace-flags=”ExecEnable” fs.py –b MutexTest –t –n 1 > >>>>>>> /dev/null >>>>>>> >>>>>>> >>>>>>> >>>>>>> Geoff >>>>>>> >>>>>>> >>>>>>> >>>>>>> *From:* [email protected] [mailto:[email protected]] *On >>>>>>> Behalf Of *Korey Sewell >>>>>>> *Sent:* Friday, April 03, 2009 9:56 AM >>>>>>> *To:* M5 Developer List >>>>>>> *Subject:* Re: [m5-dev] Memory corruption in m5 dev repository when >>>>>>> using --trace-flags="ExecEnable" >>>>>>> >>>>>>> >>>>>>> >>>>>>> I would echo Gabe sentiments. I've been suspicious of the trace-flags >>>>>>> causing memory corruption for awhile now, but every time I dig into it >>>>>>> there's some small error that I'm propagating through that finally >>>>>>> surfaces. >>>>>>> >>>>>>> In the big picture, I suspect that the trace-flags just exacerbate any >>>>>>> kind of memory-corruption issues since you are accessing things at >>>>>>> such a heavy-rate. >>>>>>> >>>>>>> In terms of debugging, is there any code that you edited that is >>>>>>> tagged when you use "ExecEnable" rather than just "Exec"? >>>>>>> >>>>>>> Also, if you can turn valgrind on for maybe the 1st thousand/million >>>>>>> cycles with ExecEnable you'll probably find something. >>>>>>> >>>>>>> On Thu, Apr 2, 2009 at 7:28 PM, Gabriel Michael Black >>>>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>>>> >>>>>>> Does this happen when you start tracing sooner? I'd suggest valgrind, >>>>>>> especially if you can make the segfault happen quickly. If you wait >>>>>>> for your simulation to get to 1400000000000 ticks in valgrind, you may >>>>>>> die before you see the result. There's a suppression file in util >>>>>>> which should cut down on the noise. >>>>>>> >>>>>>> Gabe >>>>>>> >>>>>>> >>>>>>> Quoting Geoffrey Blake <[email protected] <mailto:[email protected]>>: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> I stumbled upon what appears to be a memory corruption bug in the >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> current M5 >>>>>>> >>>>>>> >>>>>>> >>>>>>>> repository. If on the command line I enter: >>>>>>>> >>>>>>>> % ./build/ALPHA_FS/m5.opt -trace-flags="ExecEnable" >>>>>>>> -trace-start=1400000000000 fs.py -b <benchmark> -t -n <cpus> <more >>>>>>>> parameters>. The simulator will error with a segmentation fault or >>>>>>>> occasionally an assert not long after starting to trace instructions. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I have run this through gdb in with m5.debug and see the same >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> errors, the >>>>>>> >>>>>>> >>>>>>> >>>>>>>> problem is the stack trace showing the cause of the seg fault or assert >>>>>>>> changes depending on the inputs to the simulator. So, I have not >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> been able >>>>>>> >>>>>>> >>>>>>> >>>>>>>> to pin point this bug which appears to be a subtle memory corruption >>>>>>>> somewhere in the code. This error does not happen for other trace >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> flags such >>>>>>> >>>>>>> >>>>>>> >>>>>>>> as the "Cache" trace flag. It appears linked solely to the instruction >>>>>>>> tracing mechanism. Has anybody else seen this bug? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I'm using an up to date repository I pulled from m5sim.org >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> <http://m5sim.org> this morning. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Thanks, >>>>>>>> Geoff >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> m5-dev mailing list >>>>>>> [email protected] <mailto:[email protected]> >>>>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ---------- >>>>>>> Korey L Sewell >>>>>>> Graduate Student - PhD Candidate >>>>>>> Computer Science & Engineering >>>>>>> University of Michigan >>>>>>> >>>>>>> No virus found in this incoming message. >>>>>>> Checked by AVG - www.avg.com >>>>>>> Version: 8.5.285 / Virus Database: 270.11.40/2039 - Release Date: >>>>>>> 04/03/09 06:19:00 >>>>>>> >>>>>>> ------------------------------------------------------------------------ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> m5-dev mailing list >>>>>>> [email protected] >>>>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> m5-dev mailing list >>>>>> [email protected] >>>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> m5-dev mailing list >>>>> [email protected] >>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> m5-dev mailing list >>>> [email protected] >>>> http://m5sim.org/mailman/listinfo/m5-dev >>>> >>> >>> _______________________________________________ >>> m5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >> _______________________________________________ >> m5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/m5-dev >> > > > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev > _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
