Re: [gem5-dev] Failed SPARC test

Gabe Black Tue, 25 Oct 2011 03:04:04 -0700

Do you know where the solaris kernel actually is on that disk image? I
can't disassemble it if I don't know which file it is :-P. Ali?


Gabe

On 10/25/11 02:30, Gabe Black wrote:
> Ah, ok, I was just being dumb. All the stdf-s and lddf-s are just moving
> memory around, I think. That way you can load/store 64 bits at a time
> and get it done with fewer instructions. I think those instructions
> themselves can be ignored. I'm also surprised that there would be much
> floating point.
>
> I'm currently building binutils for SPARC, so hopefully I can
> disassemble some things and get a better idea of what's going on. It's
> probably going to be really annoying to figure it out.
>
> Gabe
>
> On 10/25/11 00:32, Steve Reinhardt wrote:
>> Hard to tell... there are larger and larger differences after that point
>> that seem to be cascading from this one, but it takes a while before they
>> diverge completely.  I put the trace in /tmp/tracediff-8625.out on zizzer if
>> you want to take a look for yourself.
>>
>> It seems odd that the solaris boot would be doing that much FP in any case,
>> but there does seem to be quite a bit of it.
>>
>> Steve
>>
>>
>> On Tue, Oct 25, 2011 at 12:17 AM, Gabe Black <[email protected]> wrote:
>>
>>> An FP rounding error seems very plausible, but I'm not sure how +/- zero
>>> would make any difference. I'm skeptical that our FP implementation in
>>> SPARC is accurate enough to care much about such a small difference,
>>> although it is, of course, entirely possible it cascades from there into
>>> a larger difference which breaks things.
>>>
>>> I've gone back and improved the SPARC disassembly in the past, but it's
>>> still not perfect. The problem is the hierarchy that works for getting
>>> instructions to work doesn't necessarily mirror the one you need to get
>>> accurate disassembly. I think I went with operand position too (src 0 is
>>> for this, dest 0 is for that) and that doesn't always work very well.
>>> That's probably what's going wrong here.
>>>
>>> Is there a point after this where things diverge significantly? This
>>> could be just a blip of noise and the real problem happens a lot later.
>>> It's a *major* pain in the butt to write code that theoretically handles
>>> all the little FP weird cases and gets all the bits right when the host
>>> ISA has different rules for FP than the guiest, and it's even harder to
>>> actually get the compiler to generate that code without moving things
>>> around and messing it all up. And glibc's FP support is wrong sometimes!
>>> What fun. I largely think it's farther on, and also partially am holding
>>> out hope we don't have to wade into FP soup.
>>>
>>> Gabe
>>>
>>> On 10/24/11 09:19, Steve Reinhardt wrote:
>>>> Great, thanks a lot.  I was able to build with
>>>> 'CC=/usr/bin/gcc-4.4 CXX=/usr/bin/g++-4.4' and get a binary that passes
>>> this
>>>> test on the head, so it's definitely the compiler.  I also ran tracediff
>>> and
>>>> it looks like it's an off-by-one thing with %fp; here's the first error:
>>>>
>>>> -931697720: system.cpu T0 : 0xff1aa5b8    :     stdf   %fp, [%f29 +
>>> -0x20] :
>>>> MemWrite :  D=0x423000000000197a A=0xfeffa280
>>>> +931697720: system.cpu T0 : 0xff1aa5b8    :     stdf   %fp, [%f29 +
>>> -0x20] :
>>>> MemWrite :  D=0x4230000000001979 A=0xfeffa280
>>>>
>>>> (The good gcc-4.4 version is second, so the '1979' is the correct value
>>>> here.)
>>>>
>>>> I ran one more tracediff with '--debug-flag=All --trace-start=931600000'
>>> to
>>>> see if anything else turns up sooner, and got this:
>>>>
>>>> @@ -1380553 +1380553 @@
>>>>  931697014: system.cpu.[tid:0]: Reading float reg 3 (3) bits as 0, 0.
>>>>  931697014: system.cpu.[tid:0]: Reading float reg 2 (2) bits as
>>> 0x3e300000,
>>>> 0.171875.
>>>>  931697014: global: FSR read as: 0xc0000000
>>>> -931697014: system.cpu.[tid:0]: Setting float reg 12 (12) bits to 0, 0.
>>>> +931697014: system.cpu.[tid:0]: Setting float reg 12 (12) bits to
>>>> 0x80000000, -0.
>>>>  931697014: system.cpu.[tid:0]: Setting float reg 13 (13) bits to 0, 0.
>>>>  931697014: global: FSR written with: 0xc0000000
>>>>  931697014: system.cpu + A16 T0 : 0xff1aa434    :       fsubd
>>>> %f31,%f30,%f12    : FloatAdd :  D=0x00000000c0000000
>>>> @@ -1380951 +1380951 @@
>>>>  931697038: system.cpu.[tid:0]: Reading float reg 5 (5) bits as 0, 0.
>>>>  931697038: system.cpu.[tid:0]: Reading float reg 4 (4) bits as 0, 0.
>>>>  931697038: system.cpu.[tid:0]: Reading float reg 13 (13) bits as 0, 0.
>>>> -931697038: system.cpu.[tid:0]: Reading float reg 12 (12) bits as 0, 0.
>>>> +931697038: system.cpu.[tid:0]: Reading float reg 12 (12) bits as
>>>> 0x80000000, -0.
>>>>  931697038: global: FSR read as: 0xc0000000
>>>>  931697038: system.cpu.[tid:0]: Setting float reg 18 (18) bits to 0, 0.
>>>>  931697038: system.cpu.[tid:0]: Setting float reg 19 (19) bits to 0, 0.
>>>> @@ -1381022 +1381022 @@
>>>>  931697042: system.cpu.[tid:0]: Reading float reg 10 (10) bits as
>>>> 0x41300000, 11.
>>>>  931697042: global: FSR read as: 0xc0000000
>>>>  931697042: system.cpu.[tid:0]: Setting float reg 16 (16) bits to
>>>> 0x41300000, 11.
>>>> -931697042: system.cpu.[tid:0]: Setting float reg 17 (17) bits to 0xe685,
>>>> 8.26948e-41.
>>>> +931697042: system.cpu.[tid:0]: Setting float reg 17 (17) bits to 0xe684,
>>>> 8.26934e-41.
>>>>  931697042: global: FSR written with: 0xc0000000
>>>>  931697042: system.cpu + A16 T0 : 0xff1aa4a4    :       faddd
>>> %f3,%f2,%f16
>>>>      : FloatAdd :  D=0x00000000c0000000
>>>>  931697042: Event_18: AtomicSimpleCPU tick event scheduled @ 931697043
>>>>
>>>> Could it be some kind of FP rounding error?  It's not clear how that
>>> would
>>>> end up affecting %fp though.  (Actually, looking at this a little closer,
>>>> are we even disassembling that correctly?  Seems to me it should be 'stdf
>>>> %f29, [%fp + -0x20]'.)
>>>>
>>>> I won't have time to look into this further anytime soon, but I hope this
>>>> will give someone else (Gabe?) enough to go on to get this figured out.
>>>>
>>>> Thanks,
>>>>
>>>> Steve
>>>>
>>>>
>>>> On Sun, Oct 23, 2011 at 7:50 PM, Ali Saidi <[email protected]> wrote:
>>>>
>>>>> I've installed it.
>>>>>
>>>>> Ali
>>>>>
>>>>> On Oct 23, 2011, at 7:18 PM, Steve Reinhardt wrote:
>>>>>
>>>>>> This makes sense, since the time the regression started failing is
>>>>>> consistent with when gcc was upgraded on zizzer.
>>>>>>
>>>>>> I see there is a gcc-4.4 package available for ubuntu 11.04 (which
>>> zizzer
>>>>> is
>>>>>> running)... is there more to it than installing that package and
>>>>> recompiling
>>>>>> to get a workable binary to run tracediff with?
>>>>>>
>>>>>> I'd try myself but I've forgotten my zizzer password (again!) so I
>>> can't
>>>>>> sudo.  It's tough when you've had the same password for ten years then
>>>>> you
>>>>>> change it but don't use the new one much...
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>> On Sun, Sep 25, 2011 at 1:14 PM, Ali Saidi <[email protected]> wrote:
>>>>>>
>>>>>>> Yes.. What Gabe said. With gcc 4.5 (version zizzer now runs) I cannot
>>>>> find
>>>>>>> a version of the repository that passes sparc boot.  I'm pretty sure
>>>>> it's an
>>>>>>> annoying compiler issue, but there are some annoyances is figuring out
>>>>> where
>>>>>>> to look at Gabe points out. If you're stats changes work on everything
>>>>> else,
>>>>>>> I'm happy to see them committed while this issue goes on in the
>>>>> background.
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Ali
>>>>>>>
>>>>>>> Sent from my ARM powered device
>>>>>>>
>>>>>>> On Sep 25, 2011, at 3:06 PM, Gabe Black <[email protected]>
>>> wrote:
>>>>>>>> We (Ali and I) have each looked at that before, and we think it
>>> depends
>>>>>>>> on the compiler version. Something changes when you have a new enough
>>>>>>>> gcc and then the behavior of SPARC changes. I think the new behavior
>>> is
>>>>>>>> broken and the old behavior is correct, but I'd have to look at it
>>>>>>>> again. I haven't looked into it farther than that yet because I'd
>>> want
>>>>>>>> to tracediff between versions built with different compilers. Since
>>>>> they
>>>>>>>> would need to find different versions of libraries and can't just run
>>>>>>>> from the same command line, it's logistically annoying.
>>>>>>>>
>>>>>>>> Gabe
>>>>>>>>
>>>>>>>> On 09/25/11 09:52, nathan binkert wrote:
>>>>>>>>> I'm trying to get my python stats changes into the tree, but it
>>>>>>>>> appears that one of the regression tests no longer works (zizzer
>>>>>>>>> agrees with me):
>>>>>>>>>
>>>>>>>>>
>>> SPARC_FS/tests/opt/long/80.solaris-boot/sparc/solaris/t1000-simple-atomic
>>>>>>>>> Gabe, I think you're the only one that's been messing with SPARC.
>>>  Can
>>>>>>>>> you take a look?
>>>>>>>>>
>>>>>>>>> Nate
>>>>>>>>> _______________________________________________
>>>>>>>>> gem5-dev mailing list
>>>>>>>>> [email protected]
>>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>>> _______________________________________________
>>>>>>>> gem5-dev mailing list
>>>>>>>> [email protected]
>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> gem5-dev mailing list
>>>>>>> [email protected]
>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>>
>>>>>> _______________________________________________
>>>>>> gem5-dev mailing list
>>>>>> [email protected]
>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>
>>>>> _______________________________________________
>>>>> gem5-dev mailing list
>>>>> [email protected]
>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>
>>>> _______________________________________________
>>>> gem5-dev mailing list
>>>> [email protected]
>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>> _______________________________________________
>>> gem5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>
>> _______________________________________________
>> gem5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/gem5-dev
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Failed SPARC test

Reply via email to