Re: [gem5-dev] Failed SPARC test

Gabe Black Tue, 25 Oct 2011 02:30:54 -0700

Ah, ok, I was just being dumb. All the stdf-s and lddf-s are just moving
memory around, I think. That way you can load/store 64 bits at a time
and get it done with fewer instructions. I think those instructions
themselves can be ignored. I'm also surprised that there would be much
floating point.


I'm currently building binutils for SPARC, so hopefully I can
disassemble some things and get a better idea of what's going on. It's
probably going to be really annoying to figure it out.

Gabe

On 10/25/11 00:32, Steve Reinhardt wrote:
> Hard to tell... there are larger and larger differences after that point
> that seem to be cascading from this one, but it takes a while before they
> diverge completely.  I put the trace in /tmp/tracediff-8625.out on zizzer if
> you want to take a look for yourself.
>
> It seems odd that the solaris boot would be doing that much FP in any case,
> but there does seem to be quite a bit of it.
>
> Steve
>
>
> On Tue, Oct 25, 2011 at 12:17 AM, Gabe Black <[email protected]> wrote:
>
>> An FP rounding error seems very plausible, but I'm not sure how +/- zero
>> would make any difference. I'm skeptical that our FP implementation in
>> SPARC is accurate enough to care much about such a small difference,
>> although it is, of course, entirely possible it cascades from there into
>> a larger difference which breaks things.
>>
>> I've gone back and improved the SPARC disassembly in the past, but it's
>> still not perfect. The problem is the hierarchy that works for getting
>> instructions to work doesn't necessarily mirror the one you need to get
>> accurate disassembly. I think I went with operand position too (src 0 is
>> for this, dest 0 is for that) and that doesn't always work very well.
>> That's probably what's going wrong here.
>>
>> Is there a point after this where things diverge significantly? This
>> could be just a blip of noise and the real problem happens a lot later.
>> It's a *major* pain in the butt to write code that theoretically handles
>> all the little FP weird cases and gets all the bits right when the host
>> ISA has different rules for FP than the guiest, and it's even harder to
>> actually get the compiler to generate that code without moving things
>> around and messing it all up. And glibc's FP support is wrong sometimes!
>> What fun. I largely think it's farther on, and also partially am holding
>> out hope we don't have to wade into FP soup.
>>
>> Gabe
>>
>> On 10/24/11 09:19, Steve Reinhardt wrote:
>>> Great, thanks a lot.  I was able to build with
>>> 'CC=/usr/bin/gcc-4.4 CXX=/usr/bin/g++-4.4' and get a binary that passes
>> this
>>> test on the head, so it's definitely the compiler.  I also ran tracediff
>> and
>>> it looks like it's an off-by-one thing with %fp; here's the first error:
>>>
>>> -931697720: system.cpu T0 : 0xff1aa5b8    :     stdf   %fp, [%f29 +
>> -0x20] :
>>> MemWrite :  D=0x423000000000197a A=0xfeffa280
>>> +931697720: system.cpu T0 : 0xff1aa5b8    :     stdf   %fp, [%f29 +
>> -0x20] :
>>> MemWrite :  D=0x4230000000001979 A=0xfeffa280
>>>
>>> (The good gcc-4.4 version is second, so the '1979' is the correct value
>>> here.)
>>>
>>> I ran one more tracediff with '--debug-flag=All --trace-start=931600000'
>> to
>>> see if anything else turns up sooner, and got this:
>>>
>>> @@ -1380553 +1380553 @@
>>>  931697014: system.cpu.[tid:0]: Reading float reg 3 (3) bits as 0, 0.
>>>  931697014: system.cpu.[tid:0]: Reading float reg 2 (2) bits as
>> 0x3e300000,
>>> 0.171875.
>>>  931697014: global: FSR read as: 0xc0000000
>>> -931697014: system.cpu.[tid:0]: Setting float reg 12 (12) bits to 0, 0.
>>> +931697014: system.cpu.[tid:0]: Setting float reg 12 (12) bits to
>>> 0x80000000, -0.
>>>  931697014: system.cpu.[tid:0]: Setting float reg 13 (13) bits to 0, 0.
>>>  931697014: global: FSR written with: 0xc0000000
>>>  931697014: system.cpu + A16 T0 : 0xff1aa434    :       fsubd
>>> %f31,%f30,%f12    : FloatAdd :  D=0x00000000c0000000
>>> @@ -1380951 +1380951 @@
>>>  931697038: system.cpu.[tid:0]: Reading float reg 5 (5) bits as 0, 0.
>>>  931697038: system.cpu.[tid:0]: Reading float reg 4 (4) bits as 0, 0.
>>>  931697038: system.cpu.[tid:0]: Reading float reg 13 (13) bits as 0, 0.
>>> -931697038: system.cpu.[tid:0]: Reading float reg 12 (12) bits as 0, 0.
>>> +931697038: system.cpu.[tid:0]: Reading float reg 12 (12) bits as
>>> 0x80000000, -0.
>>>  931697038: global: FSR read as: 0xc0000000
>>>  931697038: system.cpu.[tid:0]: Setting float reg 18 (18) bits to 0, 0.
>>>  931697038: system.cpu.[tid:0]: Setting float reg 19 (19) bits to 0, 0.
>>> @@ -1381022 +1381022 @@
>>>  931697042: system.cpu.[tid:0]: Reading float reg 10 (10) bits as
>>> 0x41300000, 11.
>>>  931697042: global: FSR read as: 0xc0000000
>>>  931697042: system.cpu.[tid:0]: Setting float reg 16 (16) bits to
>>> 0x41300000, 11.
>>> -931697042: system.cpu.[tid:0]: Setting float reg 17 (17) bits to 0xe685,
>>> 8.26948e-41.
>>> +931697042: system.cpu.[tid:0]: Setting float reg 17 (17) bits to 0xe684,
>>> 8.26934e-41.
>>>  931697042: global: FSR written with: 0xc0000000
>>>  931697042: system.cpu + A16 T0 : 0xff1aa4a4    :       faddd
>> %f3,%f2,%f16
>>>      : FloatAdd :  D=0x00000000c0000000
>>>  931697042: Event_18: AtomicSimpleCPU tick event scheduled @ 931697043
>>>
>>> Could it be some kind of FP rounding error?  It's not clear how that
>> would
>>> end up affecting %fp though.  (Actually, looking at this a little closer,
>>> are we even disassembling that correctly?  Seems to me it should be 'stdf
>>> %f29, [%fp + -0x20]'.)
>>>
>>> I won't have time to look into this further anytime soon, but I hope this
>>> will give someone else (Gabe?) enough to go on to get this figured out.
>>>
>>> Thanks,
>>>
>>> Steve
>>>
>>>
>>> On Sun, Oct 23, 2011 at 7:50 PM, Ali Saidi <[email protected]> wrote:
>>>
>>>> I've installed it.
>>>>
>>>> Ali
>>>>
>>>> On Oct 23, 2011, at 7:18 PM, Steve Reinhardt wrote:
>>>>
>>>>> This makes sense, since the time the regression started failing is
>>>>> consistent with when gcc was upgraded on zizzer.
>>>>>
>>>>> I see there is a gcc-4.4 package available for ubuntu 11.04 (which
>> zizzer
>>>> is
>>>>> running)... is there more to it than installing that package and
>>>> recompiling
>>>>> to get a workable binary to run tracediff with?
>>>>>
>>>>> I'd try myself but I've forgotten my zizzer password (again!) so I
>> can't
>>>>> sudo.  It's tough when you've had the same password for ten years then
>>>> you
>>>>> change it but don't use the new one much...
>>>>>
>>>>> Steve
>>>>>
>>>>> On Sun, Sep 25, 2011 at 1:14 PM, Ali Saidi <[email protected]> wrote:
>>>>>
>>>>>> Yes.. What Gabe said. With gcc 4.5 (version zizzer now runs) I cannot
>>>> find
>>>>>> a version of the repository that passes sparc boot.  I'm pretty sure
>>>> it's an
>>>>>> annoying compiler issue, but there are some annoyances is figuring out
>>>> where
>>>>>> to look at Gabe points out. If you're stats changes work on everything
>>>> else,
>>>>>> I'm happy to see them committed while this issue goes on in the
>>>> background.
>>>>>> Thanks,
>>>>>>
>>>>>> Ali
>>>>>>
>>>>>> Sent from my ARM powered device
>>>>>>
>>>>>> On Sep 25, 2011, at 3:06 PM, Gabe Black <[email protected]>
>> wrote:
>>>>>>> We (Ali and I) have each looked at that before, and we think it
>> depends
>>>>>>> on the compiler version. Something changes when you have a new enough
>>>>>>> gcc and then the behavior of SPARC changes. I think the new behavior
>> is
>>>>>>> broken and the old behavior is correct, but I'd have to look at it
>>>>>>> again. I haven't looked into it farther than that yet because I'd
>> want
>>>>>>> to tracediff between versions built with different compilers. Since
>>>> they
>>>>>>> would need to find different versions of libraries and can't just run
>>>>>>> from the same command line, it's logistically annoying.
>>>>>>>
>>>>>>> Gabe
>>>>>>>
>>>>>>> On 09/25/11 09:52, nathan binkert wrote:
>>>>>>>> I'm trying to get my python stats changes into the tree, but it
>>>>>>>> appears that one of the regression tests no longer works (zizzer
>>>>>>>> agrees with me):
>>>>>>>>
>>>>>>>>
>> SPARC_FS/tests/opt/long/80.solaris-boot/sparc/solaris/t1000-simple-atomic
>>>>>>>> Gabe, I think you're the only one that's been messing with SPARC.
>>  Can
>>>>>>>> you take a look?
>>>>>>>>
>>>>>>>> Nate
>>>>>>>> _______________________________________________
>>>>>>>> gem5-dev mailing list
>>>>>>>> [email protected]
>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>> _______________________________________________
>>>>>>> gem5-dev mailing list
>>>>>>> [email protected]
>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>>
>>>>>> _______________________________________________
>>>>>> gem5-dev mailing list
>>>>>> [email protected]
>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>
>>>>> _______________________________________________
>>>>> gem5-dev mailing list
>>>>> [email protected]
>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>
>>>> _______________________________________________
>>>> gem5-dev mailing list
>>>> [email protected]
>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>
>>> _______________________________________________
>>> gem5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/gem5-dev
>> _______________________________________________
>> gem5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/gem5-dev
>>
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Failed SPARC test

Reply via email to