Do you know where the solaris kernel actually is on that disk image? I can't disassemble it if I don't know which file it is :-P. Ali?
Gabe On 10/25/11 02:30, Gabe Black wrote: > Ah, ok, I was just being dumb. All the stdf-s and lddf-s are just moving > memory around, I think. That way you can load/store 64 bits at a time > and get it done with fewer instructions. I think those instructions > themselves can be ignored. I'm also surprised that there would be much > floating point. > > I'm currently building binutils for SPARC, so hopefully I can > disassemble some things and get a better idea of what's going on. It's > probably going to be really annoying to figure it out. > > Gabe > > On 10/25/11 00:32, Steve Reinhardt wrote: >> Hard to tell... there are larger and larger differences after that point >> that seem to be cascading from this one, but it takes a while before they >> diverge completely. I put the trace in /tmp/tracediff-8625.out on zizzer if >> you want to take a look for yourself. >> >> It seems odd that the solaris boot would be doing that much FP in any case, >> but there does seem to be quite a bit of it. >> >> Steve >> >> >> On Tue, Oct 25, 2011 at 12:17 AM, Gabe Black <[email protected]> wrote: >> >>> An FP rounding error seems very plausible, but I'm not sure how +/- zero >>> would make any difference. I'm skeptical that our FP implementation in >>> SPARC is accurate enough to care much about such a small difference, >>> although it is, of course, entirely possible it cascades from there into >>> a larger difference which breaks things. >>> >>> I've gone back and improved the SPARC disassembly in the past, but it's >>> still not perfect. The problem is the hierarchy that works for getting >>> instructions to work doesn't necessarily mirror the one you need to get >>> accurate disassembly. I think I went with operand position too (src 0 is >>> for this, dest 0 is for that) and that doesn't always work very well. >>> That's probably what's going wrong here. >>> >>> Is there a point after this where things diverge significantly? This >>> could be just a blip of noise and the real problem happens a lot later. >>> It's a *major* pain in the butt to write code that theoretically handles >>> all the little FP weird cases and gets all the bits right when the host >>> ISA has different rules for FP than the guiest, and it's even harder to >>> actually get the compiler to generate that code without moving things >>> around and messing it all up. And glibc's FP support is wrong sometimes! >>> What fun. I largely think it's farther on, and also partially am holding >>> out hope we don't have to wade into FP soup. >>> >>> Gabe >>> >>> On 10/24/11 09:19, Steve Reinhardt wrote: >>>> Great, thanks a lot. I was able to build with >>>> 'CC=/usr/bin/gcc-4.4 CXX=/usr/bin/g++-4.4' and get a binary that passes >>> this >>>> test on the head, so it's definitely the compiler. I also ran tracediff >>> and >>>> it looks like it's an off-by-one thing with %fp; here's the first error: >>>> >>>> -931697720: system.cpu T0 : 0xff1aa5b8 : stdf %fp, [%f29 + >>> -0x20] : >>>> MemWrite : D=0x423000000000197a A=0xfeffa280 >>>> +931697720: system.cpu T0 : 0xff1aa5b8 : stdf %fp, [%f29 + >>> -0x20] : >>>> MemWrite : D=0x4230000000001979 A=0xfeffa280 >>>> >>>> (The good gcc-4.4 version is second, so the '1979' is the correct value >>>> here.) >>>> >>>> I ran one more tracediff with '--debug-flag=All --trace-start=931600000' >>> to >>>> see if anything else turns up sooner, and got this: >>>> >>>> @@ -1380553 +1380553 @@ >>>> 931697014: system.cpu.[tid:0]: Reading float reg 3 (3) bits as 0, 0. >>>> 931697014: system.cpu.[tid:0]: Reading float reg 2 (2) bits as >>> 0x3e300000, >>>> 0.171875. >>>> 931697014: global: FSR read as: 0xc0000000 >>>> -931697014: system.cpu.[tid:0]: Setting float reg 12 (12) bits to 0, 0. >>>> +931697014: system.cpu.[tid:0]: Setting float reg 12 (12) bits to >>>> 0x80000000, -0. >>>> 931697014: system.cpu.[tid:0]: Setting float reg 13 (13) bits to 0, 0. >>>> 931697014: global: FSR written with: 0xc0000000 >>>> 931697014: system.cpu + A16 T0 : 0xff1aa434 : fsubd >>>> %f31,%f30,%f12 : FloatAdd : D=0x00000000c0000000 >>>> @@ -1380951 +1380951 @@ >>>> 931697038: system.cpu.[tid:0]: Reading float reg 5 (5) bits as 0, 0. >>>> 931697038: system.cpu.[tid:0]: Reading float reg 4 (4) bits as 0, 0. >>>> 931697038: system.cpu.[tid:0]: Reading float reg 13 (13) bits as 0, 0. >>>> -931697038: system.cpu.[tid:0]: Reading float reg 12 (12) bits as 0, 0. >>>> +931697038: system.cpu.[tid:0]: Reading float reg 12 (12) bits as >>>> 0x80000000, -0. >>>> 931697038: global: FSR read as: 0xc0000000 >>>> 931697038: system.cpu.[tid:0]: Setting float reg 18 (18) bits to 0, 0. >>>> 931697038: system.cpu.[tid:0]: Setting float reg 19 (19) bits to 0, 0. >>>> @@ -1381022 +1381022 @@ >>>> 931697042: system.cpu.[tid:0]: Reading float reg 10 (10) bits as >>>> 0x41300000, 11. >>>> 931697042: global: FSR read as: 0xc0000000 >>>> 931697042: system.cpu.[tid:0]: Setting float reg 16 (16) bits to >>>> 0x41300000, 11. >>>> -931697042: system.cpu.[tid:0]: Setting float reg 17 (17) bits to 0xe685, >>>> 8.26948e-41. >>>> +931697042: system.cpu.[tid:0]: Setting float reg 17 (17) bits to 0xe684, >>>> 8.26934e-41. >>>> 931697042: global: FSR written with: 0xc0000000 >>>> 931697042: system.cpu + A16 T0 : 0xff1aa4a4 : faddd >>> %f3,%f2,%f16 >>>> : FloatAdd : D=0x00000000c0000000 >>>> 931697042: Event_18: AtomicSimpleCPU tick event scheduled @ 931697043 >>>> >>>> Could it be some kind of FP rounding error? It's not clear how that >>> would >>>> end up affecting %fp though. (Actually, looking at this a little closer, >>>> are we even disassembling that correctly? Seems to me it should be 'stdf >>>> %f29, [%fp + -0x20]'.) >>>> >>>> I won't have time to look into this further anytime soon, but I hope this >>>> will give someone else (Gabe?) enough to go on to get this figured out. >>>> >>>> Thanks, >>>> >>>> Steve >>>> >>>> >>>> On Sun, Oct 23, 2011 at 7:50 PM, Ali Saidi <[email protected]> wrote: >>>> >>>>> I've installed it. >>>>> >>>>> Ali >>>>> >>>>> On Oct 23, 2011, at 7:18 PM, Steve Reinhardt wrote: >>>>> >>>>>> This makes sense, since the time the regression started failing is >>>>>> consistent with when gcc was upgraded on zizzer. >>>>>> >>>>>> I see there is a gcc-4.4 package available for ubuntu 11.04 (which >>> zizzer >>>>> is >>>>>> running)... is there more to it than installing that package and >>>>> recompiling >>>>>> to get a workable binary to run tracediff with? >>>>>> >>>>>> I'd try myself but I've forgotten my zizzer password (again!) so I >>> can't >>>>>> sudo. It's tough when you've had the same password for ten years then >>>>> you >>>>>> change it but don't use the new one much... >>>>>> >>>>>> Steve >>>>>> >>>>>> On Sun, Sep 25, 2011 at 1:14 PM, Ali Saidi <[email protected]> wrote: >>>>>> >>>>>>> Yes.. What Gabe said. With gcc 4.5 (version zizzer now runs) I cannot >>>>> find >>>>>>> a version of the repository that passes sparc boot. I'm pretty sure >>>>> it's an >>>>>>> annoying compiler issue, but there are some annoyances is figuring out >>>>> where >>>>>>> to look at Gabe points out. If you're stats changes work on everything >>>>> else, >>>>>>> I'm happy to see them committed while this issue goes on in the >>>>> background. >>>>>>> Thanks, >>>>>>> >>>>>>> Ali >>>>>>> >>>>>>> Sent from my ARM powered device >>>>>>> >>>>>>> On Sep 25, 2011, at 3:06 PM, Gabe Black <[email protected]> >>> wrote: >>>>>>>> We (Ali and I) have each looked at that before, and we think it >>> depends >>>>>>>> on the compiler version. Something changes when you have a new enough >>>>>>>> gcc and then the behavior of SPARC changes. I think the new behavior >>> is >>>>>>>> broken and the old behavior is correct, but I'd have to look at it >>>>>>>> again. I haven't looked into it farther than that yet because I'd >>> want >>>>>>>> to tracediff between versions built with different compilers. Since >>>>> they >>>>>>>> would need to find different versions of libraries and can't just run >>>>>>>> from the same command line, it's logistically annoying. >>>>>>>> >>>>>>>> Gabe >>>>>>>> >>>>>>>> On 09/25/11 09:52, nathan binkert wrote: >>>>>>>>> I'm trying to get my python stats changes into the tree, but it >>>>>>>>> appears that one of the regression tests no longer works (zizzer >>>>>>>>> agrees with me): >>>>>>>>> >>>>>>>>> >>> SPARC_FS/tests/opt/long/80.solaris-boot/sparc/solaris/t1000-simple-atomic >>>>>>>>> Gabe, I think you're the only one that's been messing with SPARC. >>> Can >>>>>>>>> you take a look? >>>>>>>>> >>>>>>>>> Nate >>>>>>>>> _______________________________________________ >>>>>>>>> gem5-dev mailing list >>>>>>>>> [email protected] >>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>>>>>> _______________________________________________ >>>>>>>> gem5-dev mailing list >>>>>>>> [email protected] >>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> gem5-dev mailing list >>>>>>> [email protected] >>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>>>>> >>>>>> _______________________________________________ >>>>>> gem5-dev mailing list >>>>>> [email protected] >>>>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>>>> >>>>> _______________________________________________ >>>>> gem5-dev mailing list >>>>> [email protected] >>>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>>> >>>> _______________________________________________ >>>> gem5-dev mailing list >>>> [email protected] >>>> http://m5sim.org/mailman/listinfo/gem5-dev >>> _______________________________________________ >>> gem5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/gem5-dev >>> >> _______________________________________________ >> gem5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/gem5-dev > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
