Re: [m5-users] SPEC06 - Good News!

Ali Saidi Mon, 10 Sep 2007 18:20:04 -0700

Elliot,

Do you still have one of your tls compiled binaries? I'd like to atleast put a check in M5 for tls and if I had a binary to test withthat would be a lot easier than compiling my own tool chain.


Thanks,
Ali

On Sep 10, 2007, at 7:19 PM, Elliott Cooper-Balis wrote:

Steve,
So after rebuilding the crosscompiler without TLS, the benchmarksworked! Thanks again for all the help and patience.
Elliott

Elliott Cooper-Balis <[EMAIL PROTECTED]> wrote:
Ali,
To be honest, I'm not entirely sure. I just used the defaultscript to build the alpha versions of gcc/g++/gfortran that camewith crosstools ( http://www.kegel.com/crosstool/ ). When I gethome, I will see if there are options for TLS in the script. Thanks
Elliott

Ali Saidi <[EMAIL PROTECTED]> wrote:
Elliott, Did you compile your toolchain with or without TLS? WhileI haven't run spec2006, I've always used a non-tls tool chain.Perhaps that is another way around the problem? Or you could seewhat values are put in the TLS area and fill them in see (src/arch/alpha/process.* and src/sim/process.*)
Ali


On Sep 10, 2007, at 1:52 AM, Steve Reinhardt wrote:
Interesting... there's nothing conclusive here, but the symbols onthe instructions at tick 172000 show that this address is probablyTLS-related too. So the good news is that this could be the samebug or a related one. I think the key thing is to figure out whatthe Linux TLS structure is supposed to look like.
One thing that's puzzling me is why all this is coming up now whenwe've run almost all of spec 2000 without any problems.
Anyone else have any ideas?

Steve

On 9/9/07, Elliott Cooper-Balis <[EMAIL PROTECTED]> wrote:
r0 gets set in the instruction right before the load into r20 :
2174500: system.cpu0 T0 : @__strtol_internal+24 : addqr0,r1,r0 : IntAlu : D=0x00000001200944f02175000: system.cpu0 T0 : @__strtol_internal+28 : ldq r20,0(r0) : MemRead : D=0x0000000000000000 A=0x1200944f0
and it doesnt look like address 0x1200944f0 gets used as an actualaddress anywhere else but here are all other references to it :
172000: system.cpu0 T0 : @__libc_setup_tls+304 : addqr10,r13,r16 : IntAlu : D=0x00000001200944f0172500: system.cpu0 T0 : @__libc_setup_tls+308 : stq r16,16(r9) : MemWrite : D=0x00000001200944f0 A=0x120092050180000: system.cpu0 T0 : @memcpy+32 : bisr31,r16,r12 : IntAlu : D=0x00000001200944f0181000: system.cpu0 T0 : @memcpy+40 : bisr31,r16,r9 : IntAlu : D=0x00000001200944f0184000: system.cpu0 T0 : @memcpy+256 : bisr31,r12,r0 : IntAlu : D=0x00000001200944f02174500: system.cpu0 T0 : @__strtol_internal+24 : addqr0,r1,r0 : IntAlu : D=0x00000001200944f02175000: system.cpu0 T0 : @__strtol_internal+28 : ldq r20,0(r0) : MemRead : D=0x0000000000000000 A=0x1200944f0
thanks again for all the help and sorry for being such pain in theass.
Steve Reinhardt <[EMAIL PROTECTED]> wrote:
The instruction at tick 2175000 loads r20 from memory location0x1200944f0 so the earlier refs are irrelevant. The nextquestions are where does r0 get set immediately prior to 2175000(i.e. does 0x1200944f0 make sense as an address) and where elsedoes 0x1200944f0 get accessed...
Steve

On 9/9/07, Elliott Cooper-Balis < [EMAIL PROTECTED]> wrote:
here are all the instances of r20 in the specrand benchmark. i'msorry i can't be of more help in debugging this issue :
4500: system.cpu0 T0 : @_start+36 : ldq r20,-32440(r29) : MemRead : D=0x0000000120000eb8 A=0x1200907a015000: system.cpu0 T0 : @__libc_start_main+60 : bisr31,r20,r15 : IntAlu : D=0x0000000120000eb8293000: system.cpu0 T0 : @__geteuid+20 : bisr31,r20,r0 : IntAlu : D=0x0000000000000064305500: system.cpu0 T0 : @__getegid+20 : bisr31,r20,r0 : IntAlu : D=0x00000000000000642175000: system.cpu0 T0 : @__strtol_internal+28 : ldq r20,0(r0) : MemRead : D=0x0000000000000000 A=0x1200944f02183500: system.cpu0 T0 : @____strtoll_l_internal+56 : bisr31,r20,r11 : IntAlu : D=0x00000000000000002184000: system.cpu0 T0 : @____strtoll_l_internal+60 : ldqr3,8(r20) : MemRead : A=0x8
the last of which being the instruction causing the page fault.


elliott

Steve Reinhardt < [EMAIL PROTECTED]> wrote:
Interesting... my guess with perl then is that the Linux kernel issupposed to be initializing some value in the thread-local storagethat we're not initializing. Unfortunately the only way to trackthat down is usually to go reading the kernel source... though ifyou find a spot where they define a base TLS struct then thatshould give it to you. Anyone else out there on the list have anyexperience with this?
As far as specrand it's impossible to say what the problem iswithout going backward further in the trace to see where r20 iscoming from. If r20 also comes from reading something out of theTLS area then it could well be the same bug.
Steve

On 9/9/07, Elliott Cooper-Balis < [EMAIL PROTECTED]> wrote:
hey steve,
i tried both of your suggestions, and the latter of which ithink might give a good clue as the memory address which causesthe fault is not referenced at any other point in the program.
here is the result of grep'ing for the address in the executiontrace :
 >grep 12022e50 exec.out
5278458500: system.cpu0 T0 : @__printf_fp+128 : addqr0,r1,r0 : IntAlu : D=0x000000012022e5085278459000: system.cpu0 T0 : @__printf_fp+132 : ldq r1,0(r0) : MemRead : D=0x0000000000000000 A=0x12022e508
which are the 2 instructions right before the fault and the only 2instances of it being referenced.
i tried digging around a little more to see if this address inparticular was causing the problems. unfortunately, that doesn'tappear to be the case. the benchmark we have been discussing isthe Perl benchmark in SPEC06. i ran the random number generatorbenchmark as well ( 999.specrand) and here is the execution outputjust before its page fault :
[EMAIL PROTECTED]:~/Development/M5/m5-2.0b3/build/ALPHA_SE$ ./m5.debug --trace-flags=Exec,Syscall,SyscallVerbose --trace-start=2000000 ../../configs/example/se.py -c benchmarks/999.specrand/exe/specrand_base.amd64-m64-gcc41-nn -o "4 3943"
....
2183000: system.cpu0 T0 : @____strtoll_l_internal+52 : bisr31,r18,r10 : IntAlu : D=0x000000000000000a2183500: system.cpu0 T0 : @____strtoll_l_internal+56 : bisr31,r20,r11 : IntAlu : D=0x00000000000000002184000: system.cpu0 T0 : @____strtoll_l_internal+60 : ldqr3,8(r20) : MemRead : A=0x8
panic: Page table fault when accessing virtual address 0x8
 @ cycle 2184000
[invoke:build/ALPHA_SE/sim/faults.cc, line 65]
Program aborted at cycle 2184000
Aborted (core dumped)
unfortunately, there doesn't appear to be (at least to me) anysimilarities between the two benchmark's output.
elliott

Steve Reinhardt < [EMAIL PROTECTED]> wrote:
It's not obvious, but it does give some clues...
The null pointer is being read from memory address 0x12022e508, soeither that's a bogus address or the memory location doesn't havethe right value (not getting initialized or getting clobbered atsome point).
The pointer address is computed by adding the uniq register (putinto R0 by "call_pal rduniq") and some value (0x28) read from-29160(r29)... I think that's the global constant pool. The uniqreg is used as a pointer to thread-local storage. So basicallyit's reading the null value out of thread-local storage. It couldbe that that's a value that the OS is supposed to provide butwe're not initializing it properly.
I'd do two more things to try and get some more clues:
- run with just --trace-flags=Syscall (and no --trace-start) toget a complete syscall trace, then look at whatever the last fewsyscalls are, and see what they are and how closely they precedethe crash- run with just --trace-flags=Exec (and no --trace-start) and thenpipe the trace through "egrep -i '12022e50[0-7]' " to look at allthe other references to that memory location... is it everwritten, if it's read before is it always zero, etc. This willtake a while...
Steve

On 9/7/07, Elliott Cooper-Balis < [EMAIL PROTECTED]> wrote:
here is the output.  is there anything obvious that might be broken?

_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
Yahoo! oneSearch: Finally, mobile search that gives answers, notweb links.
_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
Shape Yahoo! in your own image. Join our Network Research Paneltoday!
_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
Moody friends. Drama queens. Your life? Nope! - their life, yourstory.
Play Sims Stories at Yahoo! Games.


_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
Be a better Globetrotter. Get better travel answers from someonewho knows.Yahoo! Answers - Check it out._______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users


Park yourself in front of a world of choices in alternative vehicles.
Visit the Yahoo! Auto Green Center.
_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Re: [m5-users] SPEC06 - Good News!

Reply via email to