Does this problem really have anything to do with tracing, or is it
just more apparent with it?

On Sat, Apr 4, 2009 at 1:49 PM, Gabe Black <[email protected]> wrote:
> Oooooooooooh. I see what's broken. This is a result of my changes to
> allow delaying translation. What happens is that Stl_c goes into
> initiateAcc. That function calls write on the CPU which calls into the
> TLB which calls the translation callback which recognizes a failed store
> conditional which completes the instruction execution with completeAcc
> and cleans up. The call stack then collapses back to the initiateAcc
> which is still waiting to finish and which then tries to call a member
> function on traceData which was deleted during the cleanup. The problem
> here is not fundamentally complicated, but the mechanisms involved are.
> One solution would be to record the fact that we're still in
> initiateAcc, and if we are wait for the call stack to collapse back down
> to initiateAcc's caller before calling into completeAcc. That matches
> the semantics an instruction would expect more, I think, where the
> initiateAcc/completeAcc pair are called sequentially.
>
> One other concern this raises is that the code in the simple timing CPU
> is not very simple. One thing that would help would be to try to
> relocate some of the special cases, like failed store conditionals or
> memory mapped registers, into different bodies of code or at least out
> of the midst of everything else going on. I haven't thought about this
> in any depth, but I'll try to put together a flow chart sort of thing to
> explain what happens to memory instructions as they execute. That would
> be good for the sake of documentation and also so we have something
> concrete to talk about.
>
> Gabe
>
> Gabe Black wrote:
>> The segfault for me happens in malloc called by the new operator in
>> exetrace.hh on line 84. That says to me that the most likely culprit is
>> heap corruption which will be very obnoxious to track down. I've started
>> up a run of valgrind just in case it can catch something bad happening
>> sometime in the next n hours.
>>
>> Gabe
>>
>> Gabe Black wrote:
>>
>>> Oh wow. It did happen eventually. I'll see if I can figure out what's
>>> going on.
>>>
>>> Gabe
>>>
>>> Gabe Black wrote:
>>>
>>>
>>>> I tried that command line and I haven't seen any segfault yet. I'll let
>>>> it run and see if anything happens. What version of the code are you using?
>>>>
>>>> Gabe
>>>>
>>>> Geoffrey Blake wrote:
>>>>
>>>>
>>>>
>>>>> I’ve added a couple edits, but nothing major, ie: added statistics to
>>>>> the bus model, and some extra latency randomization to cache misses to
>>>>> get better averages of parallel code runs.  None of this is tied to
>>>>> the trace-flags mechanism that I can determine.
>>>>>
>>>>>
>>>>>
>>>>> I did run the code through valgrind, but ridiculously enough, the
>>>>> segfault disappears. I’ll keep digging in my spare time.
>>>>>
>>>>>
>>>>>
>>>>> The “Exec” trace flags work fine (billions of instructions, no
>>>>> problems) with an old version of m5 that is somewhere between beta4
>>>>> and beta5 of the stable releases. Now I can trace maybe a few thousand
>>>>> instructions before M5 seg faults.
>>>>>
>>>>>
>>>>>
>>>>> Here is a stripped command line that does expose the bug with the
>>>>> least number of variables to consider in case someone out there wants
>>>>> to try and duplicate the segfaults I’m seeing (it could be a product
>>>>> of my build setup, so I’d appreciate it if someone could verify
>>>>> independently):
>>>>>
>>>>> % m5.opt –trace-flags=”ExecEnable” fs.py –b MutexTest –t –n 1 > /dev/null
>>>>>
>>>>>
>>>>>
>>>>> Geoff
>>>>>
>>>>>
>>>>>
>>>>> *From:* [email protected] [mailto:[email protected]] *On
>>>>> Behalf Of *Korey Sewell
>>>>> *Sent:* Friday, April 03, 2009 9:56 AM
>>>>> *To:* M5 Developer List
>>>>> *Subject:* Re: [m5-dev] Memory corruption in m5 dev repository when
>>>>> using --trace-flags="ExecEnable"
>>>>>
>>>>>
>>>>>
>>>>> I would echo Gabe sentiments. I've been suspicious of the trace-flags
>>>>> causing memory corruption for awhile now, but every time I dig into it
>>>>> there's some small error that I'm propagating through that finally
>>>>> surfaces.
>>>>>
>>>>> In the big picture, I suspect that the trace-flags just exacerbate any
>>>>> kind of memory-corruption issues since you are accessing things at
>>>>> such a heavy-rate.
>>>>>
>>>>> In terms of debugging, is there any code that you edited that is
>>>>> tagged when you use "ExecEnable" rather than just "Exec"?
>>>>>
>>>>> Also, if you can turn valgrind on for maybe the 1st thousand/million
>>>>> cycles with ExecEnable you'll probably find something.
>>>>>
>>>>> On Thu, Apr 2, 2009 at 7:28 PM, Gabriel Michael Black
>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>
>>>>> Does this happen when you start tracing sooner? I'd suggest valgrind,
>>>>> especially if you can make the segfault happen quickly. If you wait
>>>>> for your simulation to get to 1400000000000 ticks in valgrind, you may
>>>>> die before you see the result. There's a suppression file in util
>>>>> which should cut down on the noise.
>>>>>
>>>>> Gabe
>>>>>
>>>>>
>>>>> Quoting Geoffrey Blake <[email protected] <mailto:[email protected]>>:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> I stumbled upon what appears to be a memory corruption bug in the
>>>>>>
>>>>>>
>>>>>>
>>>>> current M5
>>>>>
>>>>>
>>>>>
>>>>>> repository.  If on the command line I enter:
>>>>>>
>>>>>> % ./build/ALPHA_FS/m5.opt -trace-flags="ExecEnable"
>>>>>> -trace-start=1400000000000 fs.py -b <benchmark> -t -n <cpus> <more
>>>>>> parameters>. The simulator will error with a segmentation fault or
>>>>>> occasionally an assert not long after starting to trace instructions.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I have run this through gdb in with m5.debug and see the same
>>>>>>
>>>>>>
>>>>>>
>>>>> errors, the
>>>>>
>>>>>
>>>>>
>>>>>> problem is the stack trace showing the cause of the seg fault or assert
>>>>>> changes depending on the inputs to the simulator. So, I have not
>>>>>>
>>>>>>
>>>>>>
>>>>> been able
>>>>>
>>>>>
>>>>>
>>>>>> to pin point this bug which appears to be a subtle memory corruption
>>>>>> somewhere in the code. This error does not happen for other trace
>>>>>>
>>>>>>
>>>>>>
>>>>> flags such
>>>>>
>>>>>
>>>>>
>>>>>> as the "Cache" trace flag. It appears linked solely to the instruction
>>>>>> tracing mechanism.  Has anybody else seen this bug?
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm using an up to date repository I pulled from m5sim.org
>>>>>>
>>>>>>
>>>>>>
>>>>> <http://m5sim.org> this morning.
>>>>>
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Geoff
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> m5-dev mailing list
>>>>> [email protected] <mailto:[email protected]>
>>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ----------
>>>>> Korey L Sewell
>>>>> Graduate Student - PhD Candidate
>>>>> Computer Science & Engineering
>>>>> University of Michigan
>>>>>
>>>>> No virus found in this incoming message.
>>>>> Checked by AVG - www.avg.com
>>>>> Version: 8.5.285 / Virus Database: 270.11.40/2039 - Release Date:
>>>>> 04/03/09 06:19:00
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> m5-dev mailing list
>>>>> [email protected]
>>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> m5-dev mailing list
>>>> [email protected]
>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> m5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>
>>>
>>
>> _______________________________________________
>> m5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/m5-dev
>>
>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to