Interesting... there's nothing conclusive here, but the symbols on the instructions at tick 172000 show that this address is probably TLS-related too. So the good news is that this could be the same bug or a related one. I think the key thing is to figure out what the Linux TLS structure is supposed to look like.
One thing that's puzzling me is why all this is coming up now when we've run almost all of spec 2000 without any problems. Anyone else have any ideas? Steve On 9/9/07, Elliott Cooper-Balis <[EMAIL PROTECTED]> wrote: > > r0 gets set in the instruction right before the load into r20 : > > 2174500: system.cpu0 T0 : @__strtol_internal+24 : addq > r0,r1,r0 : IntAlu : D=0x00000001200944f0 > 2175000: system.cpu0 T0 : @__strtol_internal+28 : ldq > r20,0(r0) : MemRead : D=0x0000000000000000 A=0x1200944f0 > > > and it doesnt look like address 0x1200944f0 gets used as an actual address > anywhere else but here are all other references to it : > > 172000: system.cpu0 T0 : @__libc_setup_tls+304 : addq > r10,r13,r16 : IntAlu : D=0x00000001200944f0 > 172500: system.cpu0 T0 : @__libc_setup_tls+308 : stq > r16,16(r9) : MemWrite : D=0x00000001200944f0 A=0x120092050 > 180000: system.cpu0 T0 : @memcpy+32 : bis r31,r16,r12 : IntAlu > : D=0x00000001200944f0 > 181000: system.cpu0 T0 : @memcpy+40 : bis r31,r16,r9 : IntAlu > : D=0x00000001200944f0 > 184000: system.cpu0 T0 : @memcpy+256 : bis r31,r12,r0 : > IntAlu : D=0x00000001200944f0 > 2174500: system.cpu0 T0 : @__strtol_internal+24 : addq > r0,r1,r0 : IntAlu : D=0x00000001200944f0 > 2175000: system.cpu0 T0 : @__strtol_internal+28 : ldq > r20,0(r0) : MemRead : D=0x0000000000000000 A=0x1200944f0 > > > thanks again for all the help and sorry for being such pain in the ass. > > *Steve Reinhardt <[EMAIL PROTECTED]>* wrote: > > The instruction at tick 2175000 loads r20 from memory location 0x1200944f0 > so the earlier refs are irrelevant. The next questions are where does r0 > get set immediately prior to 2175000 (i.e. does 0x1200944f0 make sense as > an address) and where else does 0x1200944f0 get accessed... > > Steve > > On 9/9/07, Elliott Cooper-Balis <[EMAIL PROTECTED]> wrote: > > > > here are all the instances of r20 in the specrand benchmark. i'm sorry > > i can't be of more help in debugging this issue : > > > > 4500: system.cpu0 T0 : @_start+36 : ldq r20,-32440(r29) : > > MemRead : D=0x0000000120000eb8 A=0x1200907a0 > > 15000: system.cpu0 T0 : @__libc_start_main+60 : bis > > r31,r20,r15 : IntAlu : D=0x0000000120000eb8 > > 293000: system.cpu0 T0 : @__geteuid+20 : bis r31,r20,r0 : > > IntAlu : D=0x0000000000000064 > > 305500: system.cpu0 T0 : @__getegid+20 : bis r31,r20,r0 : > > IntAlu : D=0x0000000000000064 > > 2175000: system.cpu0 T0 : @__strtol_internal+28 : ldq > > r20,0(r0) : MemRead : D=0x0000000000000000 A=0x1200944f0 > > 2183500: system.cpu0 T0 : @____strtoll_l_internal+56 : bis > > r31,r20,r11 : IntAlu : D=0x0000000000000000 > > 2184000: system.cpu0 T0 : @____strtoll_l_internal+60 : ldq > > r3,8(r20) : MemRead : A=0x8 > > > > > > the last of which being the instruction causing the page fault. > > > > elliott > > > > *Steve Reinhardt < [EMAIL PROTECTED]>* wrote: > > > > Interesting... my guess with perl then is that the Linux kernel is > > supposed to be initializing some value in the thread-local storage that > > we're not initializing. Unfortunately the only way to track that down is > > usually to go reading the kernel source... though if you find a spot where > > they define a base TLS struct then that should give it to you. Anyone else > > out there on the list have any experience with this? > > > > As far as specrand it's impossible to say what the problem is without > > going backward further in the trace to see where r20 is coming from. If r20 > > also comes from reading something out of the TLS area then it could well be > > the same bug. > > > > Steve > > > > On 9/9/07, Elliott Cooper-Balis < [EMAIL PROTECTED]> wrote: > > > > > > hey steve, > > > i tried both of your suggestions, and the latter of which i think > > > might give a good clue as the memory address which causes the fault is not > > > referenced at any other point in the program. > > > > > > here is the result of grep'ing for the address in the execution > > > trace : > > > > > > >grep 12022e50 exec.out > > > 5278458500: system.cpu0 T0 : @__printf_fp+128 : addq > > > r0,r1,r0 : IntAlu : D=0x000000012022e508 > > > 5278459000: system.cpu0 T0 : @__printf_fp+132 : ldq > > > r1,0(r0) : MemRead : D=0x0000000000000000 A=0x12022e508 > > > > > > which are the 2 instructions right before the fault and the only 2 > > > instances of it being referenced. > > > > > > i tried digging around a little more to see if this address in > > > particular was causing the problems. unfortunately, that doesn't appear > > > to > > > be the case. the benchmark we have been discussing is the Perl benchmark > > > in > > > SPEC06. i ran the random number generator benchmark as well ( > > > 999.specrand) and here is the execution output just before its page > > > fault : > > > > > > [EMAIL PROTECTED]:~/Development/M5/m5-2.0b3/build/ALPHA_SE$ > > > ./m5.debug --trace-flags=Exec,Syscall,SyscallVerbose --trace-start=2000000 > > > ../../configs/example/se.py -c > > > benchmarks/999.specrand/exe/specrand_base.amd64-m64-gcc41-nn -o "4 3943" > > > > > > .... > > > > > > 2183000: system.cpu0 T0 : @____strtoll_l_internal+52 : bis > > > r31,r18,r10 : IntAlu : D=0x000000000000000a > > > 2183500: system.cpu0 T0 : @____strtoll_l_internal+56 : bis > > > r31,r20,r11 : IntAlu : D=0x0000000000000000 > > > 2184000: system.cpu0 T0 : @____strtoll_l_internal+60 : ldq > > > r3,8(r20) : MemRead : A=0x8 > > > panic: Page table fault when accessing virtual address 0x8 > > > @ cycle 2184000 > > > [invoke:build/ALPHA_SE/sim/faults.cc, line 65] > > > Program aborted at cycle 2184000 > > > Aborted (core dumped) > > > > > > unfortunately, there doesn't appear to be (at least to me) any > > > similarities between the two benchmark's output. > > > > > > > > > elliott > > > > > > *Steve Reinhardt < [EMAIL PROTECTED]>* wrote: > > > > > > It's not obvious, but it does give some clues... > > > > > > The null pointer is being read from memory address 0x12022e508, so > > > either that's a bogus address or the memory location doesn't have the > > > right > > > value (not getting initialized or getting clobbered at some point). > > > > > > The pointer address is computed by adding the uniq register (put into > > > R0 by "call_pal rduniq") and some value (0x28) read from -29160(r29)... I > > > think that's the global constant pool. The uniq reg is used as a pointer > > > to > > > thread-local storage. So basically it's reading the null value out of > > > thread-local storage. It could be that that's a value that the OS is > > > supposed to provide but we're not initializing it properly. > > > > > > I'd do two more things to try and get some more clues: > > > > > > - run with just --trace-flags=Syscall (and no --trace-start) to get a > > > complete syscall trace, then look at whatever the last few syscalls are, > > > and > > > see what they are and how closely they precede the crash > > > - run with just --trace-flags=Exec (and no --trace-start) and then > > > pipe the trace through "egrep -i '12022e50[0-7]' " to look at all the > > > other > > > references to that memory location... is it ever written, if it's read > > > before is it always zero, etc. This will take a while... > > > > > > Steve > > > > > > On 9/7/07, Elliott Cooper-Balis < [EMAIL PROTECTED]> wrote: > > > > > > > > here is the output. is there anything obvious that might be broken? > > > > > > > > > > > > > > _______________________________________________ > > > m5-users mailing list > > > m5-users@m5sim.org > > > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > > > > > > > > > ------------------------------ > > > Yahoo! oneSearch: Finally, mobile search that gives > > > answers<http://us.rd.yahoo.com/evt=48252/*http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC>, > > > not web links. > > > > > > _______________________________________________ > > > m5-users mailing list > > > m5-users@m5sim.org > > > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > > > > > > > _______________________________________________ > > m5-users mailing list > > m5-users@m5sim.org > > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > > > > > > ------------------------------ > > Shape Yahoo! in your own image. Join our Network Research Panel > > today!<http://us.rd.yahoo.com/evt=48517/*http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7> > > > > _______________________________________________ > > m5-users mailing list > > m5-users@m5sim.org > > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > > > > _______________________________________________ > m5-users mailing list > m5-users@m5sim.org > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > > > ------------------------------ > Moody friends. Drama queens. Your life? Nope! - their life, your story. > Play Sims Stories at Yahoo! Games. > <http://us.rd.yahoo.com/evt=48224/*http://sims.yahoo.com/> > > > _______________________________________________ > m5-users mailing list > m5-users@m5sim.org > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >
_______________________________________________ m5-users mailing list m5-users@m5sim.org http://m5sim.org/cgi-bin/mailman/listinfo/m5-users