It's broader than tracing and not caused by the tracing mechanism  
itself, but I think it will only show up with tracing. The pointer to  
the trace data will be NULL otherwise and the instruction won't  
attempt to use it. Nothing else that exists currently to my knowledge  
is in a position to be affected by this.

Gabe

Quoting nathan binkert <[email protected]>:

> Does this problem really have anything to do with tracing, or is it
> just more apparent with it?
>
> On Sat, Apr 4, 2009 at 1:49 PM, Gabe Black <[email protected]> wrote:
>> Oooooooooooh. I see what's broken. This is a result of my changes to
>> allow delaying translation. What happens is that Stl_c goes into
>> initiateAcc. That function calls write on the CPU which calls into the
>> TLB which calls the translation callback which recognizes a failed store
>> conditional which completes the instruction execution with completeAcc
>> and cleans up. The call stack then collapses back to the initiateAcc
>> which is still waiting to finish and which then tries to call a member
>> function on traceData which was deleted during the cleanup. The problem
>> here is not fundamentally complicated, but the mechanisms involved are.
>> One solution would be to record the fact that we're still in
>> initiateAcc, and if we are wait for the call stack to collapse back down
>> to initiateAcc's caller before calling into completeAcc. That matches
>> the semantics an instruction would expect more, I think, where the
>> initiateAcc/completeAcc pair are called sequentially.
>>
>> One other concern this raises is that the code in the simple timing CPU
>> is not very simple. One thing that would help would be to try to
>> relocate some of the special cases, like failed store conditionals or
>> memory mapped registers, into different bodies of code or at least out
>> of the midst of everything else going on. I haven't thought about this
>> in any depth, but I'll try to put together a flow chart sort of thing to
>> explain what happens to memory instructions as they execute. That would
>> be good for the sake of documentation and also so we have something
>> concrete to talk about.
>>
>> Gabe
>>
>> Gabe Black wrote:
>>> The segfault for me happens in malloc called by the new operator in
>>> exetrace.hh on line 84. That says to me that the most likely culprit is
>>> heap corruption which will be very obnoxious to track down. I've started
>>> up a run of valgrind just in case it can catch something bad happening
>>> sometime in the next n hours.
>>>
>>> Gabe
>>>
>>> Gabe Black wrote:
>>>
>>>> Oh wow. It did happen eventually. I'll see if I can figure out what's
>>>> going on.
>>>>
>>>> Gabe
>>>>
>>>> Gabe Black wrote:
>>>>
>>>>
>>>>> I tried that command line and I haven't seen any segfault yet. I'll let
>>>>> it run and see if anything happens. What version of the code are  
>>>>> you using?
>>>>>
>>>>> Gabe
>>>>>
>>>>> Geoffrey Blake wrote:
>>>>>
>>>>>
>>>>>
>>>>>> I’ve added a couple edits, but nothing major, ie: added statistics to
>>>>>> the bus model, and some extra latency randomization to cache misses to
>>>>>> get better averages of parallel code runs.  None of this is tied to
>>>>>> the trace-flags mechanism that I can determine.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I did run the code through valgrind, but ridiculously enough, the
>>>>>> segfault disappears. I’ll keep digging in my spare time.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The “Exec” trace flags work fine (billions of instructions, no
>>>>>> problems) with an old version of m5 that is somewhere between beta4
>>>>>> and beta5 of the stable releases. Now I can trace maybe a few thousand
>>>>>> instructions before M5 seg faults.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Here is a stripped command line that does expose the bug with the
>>>>>> least number of variables to consider in case someone out there wants
>>>>>> to try and duplicate the segfaults I’m seeing (it could be a product
>>>>>> of my build setup, so I’d appreciate it if someone could verify
>>>>>> independently):
>>>>>>
>>>>>> % m5.opt –trace-flags=”ExecEnable” fs.py –b MutexTest –t –n 1 >  
>>>>>> /dev/null
>>>>>>
>>>>>>
>>>>>>
>>>>>> Geoff
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* [email protected] [mailto:[email protected]] *On
>>>>>> Behalf Of *Korey Sewell
>>>>>> *Sent:* Friday, April 03, 2009 9:56 AM
>>>>>> *To:* M5 Developer List
>>>>>> *Subject:* Re: [m5-dev] Memory corruption in m5 dev repository when
>>>>>> using --trace-flags="ExecEnable"
>>>>>>
>>>>>>
>>>>>>
>>>>>> I would echo Gabe sentiments. I've been suspicious of the trace-flags
>>>>>> causing memory corruption for awhile now, but every time I dig into it
>>>>>> there's some small error that I'm propagating through that finally
>>>>>> surfaces.
>>>>>>
>>>>>> In the big picture, I suspect that the trace-flags just exacerbate any
>>>>>> kind of memory-corruption issues since you are accessing things at
>>>>>> such a heavy-rate.
>>>>>>
>>>>>> In terms of debugging, is there any code that you edited that is
>>>>>> tagged when you use "ExecEnable" rather than just "Exec"?
>>>>>>
>>>>>> Also, if you can turn valgrind on for maybe the 1st thousand/million
>>>>>> cycles with ExecEnable you'll probably find something.
>>>>>>
>>>>>> On Thu, Apr 2, 2009 at 7:28 PM, Gabriel Michael Black
>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>
>>>>>> Does this happen when you start tracing sooner? I'd suggest valgrind,
>>>>>> especially if you can make the segfault happen quickly. If you wait
>>>>>> for your simulation to get to 1400000000000 ticks in valgrind, you may
>>>>>> die before you see the result. There's a suppression file in util
>>>>>> which should cut down on the noise.
>>>>>>
>>>>>> Gabe
>>>>>>
>>>>>>
>>>>>> Quoting Geoffrey Blake <[email protected] <mailto:[email protected]>>:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I stumbled upon what appears to be a memory corruption bug in the
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> current M5
>>>>>>
>>>>>>
>>>>>>
>>>>>>> repository.  If on the command line I enter:
>>>>>>>
>>>>>>> % ./build/ALPHA_FS/m5.opt -trace-flags="ExecEnable"
>>>>>>> -trace-start=1400000000000 fs.py -b <benchmark> -t -n <cpus> <more
>>>>>>> parameters>. The simulator will error with a segmentation fault or
>>>>>>> occasionally an assert not long after starting to trace instructions.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I have run this through gdb in with m5.debug and see the same
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> errors, the
>>>>>>
>>>>>>
>>>>>>
>>>>>>> problem is the stack trace showing the cause of the seg fault or assert
>>>>>>> changes depending on the inputs to the simulator. So, I have not
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> been able
>>>>>>
>>>>>>
>>>>>>
>>>>>>> to pin point this bug which appears to be a subtle memory corruption
>>>>>>> somewhere in the code. This error does not happen for other trace
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> flags such
>>>>>>
>>>>>>
>>>>>>
>>>>>>> as the "Cache" trace flag. It appears linked solely to the instruction
>>>>>>> tracing mechanism.  Has anybody else seen this bug?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I'm using an up to date repository I pulled from m5sim.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> <http://m5sim.org> this morning.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>> Geoff
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> m5-dev mailing list
>>>>>> [email protected] <mailto:[email protected]>
>>>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ----------
>>>>>> Korey L Sewell
>>>>>> Graduate Student - PhD Candidate
>>>>>> Computer Science & Engineering
>>>>>> University of Michigan
>>>>>>
>>>>>> No virus found in this incoming message.
>>>>>> Checked by AVG - www.avg.com
>>>>>> Version: 8.5.285 / Virus Database: 270.11.40/2039 - Release Date:
>>>>>> 04/03/09 06:19:00
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>> _______________________________________________
>>>>>> m5-dev mailing list
>>>>>> [email protected]
>>>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> m5-dev mailing list
>>>>> [email protected]
>>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> m5-dev mailing list
>>>> [email protected]
>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> m5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>
>>
>> _______________________________________________
>> m5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/m5-dev
>>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>


_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to