Re: [gem5-users] problem with benchmarks which uses '<'

Mahmood Naderan Fri, 27 Apr 2012 11:46:51 -0700

ok I think I find the bug. I used "continue" and "ctrl+c" multiple
times to see if it stuck at a particular function. The backtrace
shows:



#0  0x00000000004dfee7 in __gnu_cxx::hashtable<std::pair<unsigned long
const, X86ISA::TlbEntry>, unsigned long, __gnu_cxx::hash<unsigned
long>, std::_Select1st<std::pair<unsigned long const,
X86ISA::TlbEntry> >, std::equal_to<unsigned long>,
std::allocator<X86ISA::TlbEntry> >::_M_bkt_num_key (this=0x28b8970,
    __key=@0x21d9a7c8, __n=50331653) at
/usr/include/c++/4.4/backward/hashtable.h:590
#1  0x00000000004dfff9 in __gnu_cxx::hashtable<std::pair<unsigned long
const, X86ISA::TlbEntry>, unsigned long, __gnu_cxx::hash<unsigned
long>, std::_Select1st<std::pair<unsigned long const,
X86ISA::TlbEntry> >, std::equal_to<unsigned long>,
std::allocator<X86ISA::TlbEntry> >::_M_bkt_num (this=0x28b8970,
    __obj=..., __n=50331653) at /usr/include/c++/4.4/backward/hashtable.h:594
#2  0x00000000004df9c8 in __gnu_cxx::hashtable<std::pair<unsigned long
const, X86ISA::TlbEntry>, unsigned long, __gnu_cxx::hash<unsigned
long>, std::_Select1st<std::pair<unsigned long const,
X86ISA::TlbEntry> >, std::equal_to<unsigned long>,
std::allocator<X86ISA::TlbEntry> >::resize (this=0x28b8970,
    __num_elements_hint=25165844) at
/usr/include/c++/4.4/backward/hashtable.h:1001
#3  0x00000000004df100 in __gnu_cxx::hashtable<std::pair<unsigned long
const, X86ISA::TlbEntry>, unsigned long, __gnu_cxx::hash<unsigned
long>, std::_Select1st<std::pair<unsigned long const,
X86ISA::TlbEntry> >, std::equal_to<unsigned long>,
std::allocator<X86ISA::TlbEntry> >::find_or_insert (this=0x28b8970,
    __obj=...) at /usr/include/c++/4.4/backward/hashtable.h:789
#4  0x00000000004deaca in __gnu_cxx::hash_map<unsigned long,
X86ISA::TlbEntry, __gnu_cxx::hash<unsigned long>,
std::equal_to<unsigned long>, std::allocator<X86ISA::TlbEntry>
>::operator[] (this=0x28b8970,
    __key=@0x7fffffffba80) at /usr/include/c++/4.4/ext/hash_map:217
#5  0x00000000004daa68 in PageTable::map (this=0x28b8970,
vaddr=47015569313792, paddr=103079288832,
    size=5548434767986339840, clobber=false) at build/X86/mem/page_table.cc:82
#6  0x000000000074b9c8 in Process::allocateMem (this=0x30be640,
vaddr=46912496128000,
    size=5548434871059525632, clobber=false) at build/X86/sim/process.cc:332
#7  0x00000000007aba21 in mmapFunc<X86Linux64> (desc=0x2052fb8, num=9,
p=0x30be640, tc=0x3331210)
    at build/X86/sim/syscall_emul.hh:1069
#8  0x000000000073ca11 in SyscallDesc::doSyscall (this=0x2052fb8,
callnum=9, process=0x30be640,
    tc=0x3331210) at build/X86/sim/syscall_emul.cc:69
#9  0x00000000007516a0 in LiveProcess::syscall (this=0x30be640,
callnum=9, tc=0x3331210)
    at build/X86/sim/process.cc:590
#10 0x0000000000c10ce3 in SimpleThread::syscall (this=0x33305d0, callnum=9)
    at build/X86/cpu/simple_thread.hh:384



As you can see there is a problem with mmapFunc<X86Linux64> syscall
which allocate memory through Process::allocateMem
That is my understanding....



On 4/27/12, Mahmood Naderan <[email protected]> wrote:
> Is this useful?
>
> 339051500: system.cpu + A0 T0 : 0x83d48d.4  :   CALL_NEAR_I : wrip   ,
> t7, t1 : IntAlu :
> 339052000: system.cpu.icache: ReadReq (ifetch) 452f90 hit
> 339052000: system.cpu + A0 T0 : 0x852f90    : mov       r10, rcx
> 339052000: system.cpu + A0 T0 : 0x852f90.0  :   MOV_R_R : mov   r10,
> r10, rcx : IntAlu :  D=0x0000000000000022
> 339052500: system.cpu.icache: ReadReq (ifetch) 452f90 hit
> 339052500: system.cpu + A0 T0 : 0x852f93    : mov       eax, 0x9
> 339052500: system.cpu + A0 T0 : 0x852f93.0  :   MOV_R_I : limm   eax,
> 0x9 : IntAlu :  D=0x0000000000000009
> 339053000: system.cpu.icache: ReadReq (ifetch) 452f98 hit
> ^C
> Program received signal SIGINT, Interrupt.
> 0x00000000004e0f90 in
> std::__fill_n_a<__gnu_cxx::_Hashtable_node<std::pair<unsigned long
> const, X86ISA::TlbEntry> >**, unsigned long,
> __gnu_cxx::_Hashtable_node<std::pair<unsigned long const,
> X86ISA::TlbEntry> >*> (__first=0x7fff70017000, __n=4065295,
> __value=@0x7fffffffb8d0)
>     at /usr/include/c++/4.4/bits/stl_algobase.h:758
> 758             *__first = __tmp;
> (gdb) ^CQuit
> (gdb)
>
>
>
> On 4/27/12, Steve Reinhardt <[email protected]> wrote:
>> Perhaps you could fire off the run under gdb, and use the --debug-break
>> flag to drop in to gdb at the tick where it seems to stop running.  If
>> the
>> simulation stops and memory blows up, it's almost like you're stuck in
>> some
>> subtle infinite loop with a memory allocation in it.  (You might have to
>> continue just a little past there and hit ctrl-c before it dies to catch
>> it
>> in the middle of this loop.)
>>
>> On Fri, Apr 27, 2012 at 11:29 AM, Mahmood Naderan
>> <[email protected]>wrote:
>>
>>> i searched for something similar (stoping the simulation when it reach
>>> at a specific memory usage to prevent killing) but didn't find such
>>> thing. Do you know?
>>>
>>> I also attached gdb. it doesn't show anything useful because lastly it
>>> get killed.
>>>
>>> On 4/27/12, Gabe Black <[email protected]> wrote:
>>> > Valgrind should tell you where the leaked memory was allocated. You
>>> > may
>>> > have to give it a command line option for that, or stop it before it
>>> > gets itself killed.
>>> >
>>> > Gabe
>>> >
>>> > On 04/27/12 11:10, Steve Reinhardt wrote:
>>> >> Can you attach gdb when it does this, see where it's at, and maybe
>>> >> step through the code a bit to see what it's doing?
>>> >>
>>> >> On Fri, Apr 27, 2012 at 10:54 AM, Mahmood Naderan
>>> >> <[email protected] <mailto:[email protected]>> wrote:
>>> >>
>>> >>     That was a guess. As I said, i turned on the debugger to see when
>>> >> it
>>> >>     start eating the memory. As you can see the last messageit print
>>> >> is:
>>> >>     339069000: system.cpu + A0 T0 : 0x852f93.0  :   MOV_R_I : limm
>>> eax,
>>> >>     0x9 : IntAlu :  D=0x0000000000000009
>>> >>     339069500: system.cpu.icache: set be: moving blk 452f80 to MRU
>>> >>     339069500: system.cpu.icache: ReadReq (ifetch) 452f98 hit
>>> >>
>>> >>     Then no message is printed and I see, with top command, that the
>>> >>     memory usage gos up and up until it consumes all memory.
>>> >>
>>> >>
>>> >>     On 4/27/12, Nilay Vaish <[email protected]
>>> >>     <mailto:[email protected]>> wrote:
>>> >>     > How do you know the instruction at which the memory starts
>>> >>     leaking? What
>>> >>     > should we conclude from the instruction trace in your mail. I
>>> >> am
>>> >>     unable to
>>> >>     > arrive at any conclusion from the valgrind report that you had
>>> >>     attached.
>>> >>     > Apart from the info on uninitialized values, I did not find any
>>> >>     useful
>>> >>     > output produced by valgrind.
>>> >>     >
>>> >>     > --
>>> >>     > Nilay
>>> >>     >
>>> >>     > On Fri, 27 Apr 2012, Mahmood Naderan wrote:
>>> >>     >
>>> >>     >> tonto with the test input uses about 4 GB and runs for about 2
>>> >>     seconds
>>> >>     >> on a real machine.
>>> >>     >>
>>> >>     >> I also used the test input with gem5. However again after tick
>>> >>     >> 300000000, all the 30GB memory is used and then gem5 is
>>> >> killed.
>>> >> The
>>> >>     >> same behaviour with ref input...
>>> >>     >>
>>> >>     >> I ran the following command:
>>> >>     >> valgrind --tool=memcheck --leak-check=full --track-origins=yes
>>> >>     >> --suppressions=../util/valgrind-suppressions
>>> ../build/X86/m5.debug
>>> >>     >> --debug-flags=Cache,ExecAll,Bus,CacheRepl,Context
>>> >>     >> --trace-start=339050000 ../configs/example/se.py -c
>>> >>     >> tonto_base.amd64-m64-gcc44-nn --cpu-type=detailed -F 5000000
>>> >>     --maxtick
>>> >>     >> 10000000 --caches --l2cache --prog-interval=100000
>>> >>     >>
>>> >>     >>
>>> >>     >> I also attach the report again. At the instruction that the
>>> memory
>>> >>     >> leak begins, you can see:
>>> >>     >> ...
>>> >>     >> 339066000: system.cpu + A0 T0 : 0x83d48d    : call   0x15afe
>>> >>     >> 339066000: system.cpu + A0 T0 : 0x83d48d.0  :   CALL_NEAR_I :
>>> limm
>>> >>     >> t1, 0x15afe : IntAlu :  D=0x0000000000015afe
>>> >>     >> 339066500: system.cpu + A0 T0 : 0x83d48d.1  :   CALL_NEAR_I :
>>> rdip
>>> >>     >> t7, %ctrl153,  : IntAlu :  D=0x000000000083d492
>>> >>     >> 339067000: system.cpu.dcache: set 9a: moving blk 5aa680 to MRU
>>> >>     >> 339067000: system.cpu.dcache: WriteReq 5aa6b8 hit
>>> >>     >> 339067000: system.cpu + A0 T0 : 0x83d48d.2  :   CALL_NEAR_I :
>>> >>     st   t7,
>>> >>     >> SS:[rsp + 0xfffffffffffffff8] : MemWrite :
>>> >> D=0x000000000083d492
>>> >>     >> A=0x7fffffffe6b8
>>> >>     >> 339067500: system.cpu + A0 T0 : 0x83d48d.3  :   CALL_NEAR_I :
>>> subi
>>> >>     >> rsp, rsp, 0x8 : IntAlu :  D=0x00007fffffffe6b8
>>> >>     >> 339068000: system.cpu + A0 T0 : 0x83d48d.4  :   CALL_NEAR_I :
>>> >>     wrip   ,
>>> >>     >> t7, t1 : IntAlu :
>>> >>     >> 339068500: system.cpu.icache: set be: moving blk 452f80 to MRU
>>> >>     >> 339068500: system.cpu.icache: ReadReq (ifetch) 452f90 hit
>>> >>     >> 339068500: system.cpu + A0 T0 : 0x852f90    : mov    r10, rcx
>>> >>     >> 339068500: system.cpu + A0 T0 : 0x852f90.0  :   MOV_R_R : mov
>>> >>     r10,
>>> >>     >> r10, rcx : IntAlu :  D=0x0000000000000022
>>> >>     >> 339069000: system.cpu.icache: set be: moving blk 452f80 to MRU
>>> >>     >> 339069000: system.cpu.icache: ReadReq (ifetch) 452f90 hit
>>> >>     >> 339069000: system.cpu + A0 T0 : 0x852f93    : mov    eax, 0x9
>>> >>     >> 339069000: system.cpu + A0 T0 : 0x852f93.0  :   MOV_R_I : limm
>>> >>       eax,
>>> >>     >> 0x9 : IntAlu :  D=0x0000000000000009
>>> >>     >> 339069500: system.cpu.icache: set be: moving blk 452f80 to MRU
>>> >>     >> 339069500: system.cpu.icache: ReadReq (ifetch) 452f98 hit
>>> >>     >>
>>> >>     >>
>>> >>     >> What is your opinion then?
>>> >>     >> Regards,
>>> >>     >>
>>> >>     >> On 4/27/12, Steve Reinhardt <[email protected]
>>> >>     <mailto:[email protected]>> wrote:
>>> >>     >>> Also, if you do run valgrind, use the
>>> >>     util/valgrind-suppressions file to
>>> >>     >>> suppress spurious reports.  Read the valgrind docs to see how
>>> >> this
>>> >>     >>> works.
>>> >>     >>>
>>> >>     >>> Steve
>>> >>     >>>
>>> >>     > _______________________________________________
>>> >>     > gem5-users mailing list
>>> >>     > [email protected] <mailto:[email protected]>
>>> >>     > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>> >>     >
>>> >>
>>> >>
>>> >>     --
>>> >>     // Naderan *Mahmood;
>>> >>     _______________________________________________
>>> >>     gem5-users mailing list
>>> >>     [email protected] <mailto:[email protected]>
>>> >>     http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>> >>
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> gem5-users mailing list
>>> >> [email protected]
>>> >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>> >
>>> >
>>>
>>>
>>> --
>>> // Naderan *Mahmood;
>>> _______________________________________________
>>> gem5-users mailing list
>>> [email protected]
>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>
>>
>
>
> --
> // Naderan *Mahmood;
>


-- 
// Naderan *Mahmood;
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] problem with benchmarks which uses '<'

Reply via email to