Re: [m5-users] SPEC06

Elliott Cooper-Balis Sun, 09 Sep 2007 16:35:24 -0700

hey steve,
  i tried both of your suggestions, and the latter of which i think might give 
a good clue as the memory address which causes the fault is not referenced at 
any other point in the program. 
  
  here is the result of grep'ing for the address in the execution trace :


 >grep 12022e50 exec.out 
5278458500: system.cpu0 T0 : @__printf_fp+128 : addq       r0,r1,r0        : 
IntAlu :  D=0x000000012022e508
5278459000: system.cpu0 T0 : @__printf_fp+132 : ldq        r1,0(r0)        : 
MemRead :  D=0x0000000000000000 A=0x12022e508

which are the 2 instructions right before the fault and the only 2 instances of 
it being referenced. 

i tried digging around a little more to see if this address in particular was 
causing the problems.  unfortunately, that doesn't appear to be the case.  the 
benchmark we have been discussing is the Perl benchmark in SPEC06.  i ran the 
random number generator benchmark as well (999.specrand) and here is the 
execution output just before its page fault : 

[EMAIL PROTECTED]:~/Development/M5/m5-2.0b3/build/ALPHA_SE$ ./m5.debug 
--trace-flags=Exec,Syscall,SyscallVerbose --trace-start=2000000 
../../configs/example/se.py -c 
benchmarks/999.specrand/exe/specrand_base.amd64-m64-gcc41-nn -o "4 3943"

....

2183000: system.cpu0 T0 : @____strtoll_l_internal+52 : bis        r31,r18,r10   
  : IntAlu :  D=0x000000000000000a
2183500: system.cpu0 T0 : @____strtoll_l_internal+56 : bis        r31,r20,r11   
  : IntAlu :  D=0x0000000000000000
2184000: system.cpu0 T0 : @____strtoll_l_internal+60 : ldq        r3,8(r20)     
  : MemRead :  A=0x8
panic: Page table fault when accessing virtual address 0x8
 @ cycle 2184000
[invoke:build/ALPHA_SE/sim/faults.cc, line 65]
Program aborted at cycle 2184000
Aborted (core dumped)

unfortunately, there doesn't appear to be (at least to me) any similarities 
between the two benchmark's output.  


elliott

Steve Reinhardt <[EMAIL PROTECTED]> wrote: It's not obvious, but it does give 
some clues...

The null pointer is being read from memory address 0x12022e508, so either 
that's a bogus address or the memory location doesn't have the right value (not 
getting initialized or getting clobbered at some point). 

The pointer address is computed by adding the uniq register (put into R0 by 
"call_pal rduniq") and some value (0x28) read from -29160(r29)... I think 
that's the global constant pool.  The uniq reg is used as a pointer to 
thread-local storage.  So basically it's reading the null value out of 
thread-local storage.  It could be that that's a value that the OS is supposed 
to provide but we're not initializing it properly. 

I'd do two more things to try and get some more clues:

- run with just --trace-flags=Syscall (and no --trace-start) to get a complete 
syscall trace, then look at whatever the last few syscalls are, and see what 
they are and how closely they precede the crash 
- run with just --trace-flags=Exec (and no --trace-start) and then pipe the 
trace through "egrep -i '12022e50[0-7]' " to look at all the other references 
to that memory location... is it ever written, if it's read before is it always 
zero, etc.  This will take a while... 

Steve

On 9/7/07, Elliott Cooper-Balis <[EMAIL PROTECTED]> wrote: here is the output.  
is there anything obvious that might be broken?


 _______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

       
---------------------------------
Yahoo! oneSearch: Finally,  mobile search that gives answers, not web links.

_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Re: [m5-users] SPEC06

Reply via email to