An FP rounding error seems very plausible, but I'm not sure how +/- zero
would make any difference. I'm skeptical that our FP implementation in
SPARC is accurate enough to care much about such a small difference,
although it is, of course, entirely possible it cascades from there into
a larger difference which breaks things.

I've gone back and improved the SPARC disassembly in the past, but it's
still not perfect. The problem is the hierarchy that works for getting
instructions to work doesn't necessarily mirror the one you need to get
accurate disassembly. I think I went with operand position too (src 0 is
for this, dest 0 is for that) and that doesn't always work very well.
That's probably what's going wrong here.

Is there a point after this where things diverge significantly? This
could be just a blip of noise and the real problem happens a lot later.
It's a *major* pain in the butt to write code that theoretically handles
all the little FP weird cases and gets all the bits right when the host
ISA has different rules for FP than the guiest, and it's even harder to
actually get the compiler to generate that code without moving things
around and messing it all up. And glibc's FP support is wrong sometimes!
What fun. I largely think it's farther on, and also partially am holding
out hope we don't have to wade into FP soup.

Gabe

On 10/24/11 09:19, Steve Reinhardt wrote:
> Great, thanks a lot.  I was able to build with
> 'CC=/usr/bin/gcc-4.4 CXX=/usr/bin/g++-4.4' and get a binary that passes this
> test on the head, so it's definitely the compiler.  I also ran tracediff and
> it looks like it's an off-by-one thing with %fp; here's the first error:
>
> -931697720: system.cpu T0 : 0xff1aa5b8    :     stdf   %fp, [%f29 + -0x20] :
> MemWrite :  D=0x423000000000197a A=0xfeffa280
> +931697720: system.cpu T0 : 0xff1aa5b8    :     stdf   %fp, [%f29 + -0x20] :
> MemWrite :  D=0x4230000000001979 A=0xfeffa280
>
> (The good gcc-4.4 version is second, so the '1979' is the correct value
> here.)
>
> I ran one more tracediff with '--debug-flag=All --trace-start=931600000' to
> see if anything else turns up sooner, and got this:
>
> @@ -1380553 +1380553 @@
>  931697014: system.cpu.[tid:0]: Reading float reg 3 (3) bits as 0, 0.
>  931697014: system.cpu.[tid:0]: Reading float reg 2 (2) bits as 0x3e300000,
> 0.171875.
>  931697014: global: FSR read as: 0xc0000000
> -931697014: system.cpu.[tid:0]: Setting float reg 12 (12) bits to 0, 0.
> +931697014: system.cpu.[tid:0]: Setting float reg 12 (12) bits to
> 0x80000000, -0.
>  931697014: system.cpu.[tid:0]: Setting float reg 13 (13) bits to 0, 0.
>  931697014: global: FSR written with: 0xc0000000
>  931697014: system.cpu + A16 T0 : 0xff1aa434    :       fsubd
> %f31,%f30,%f12    : FloatAdd :  D=0x00000000c0000000
> @@ -1380951 +1380951 @@
>  931697038: system.cpu.[tid:0]: Reading float reg 5 (5) bits as 0, 0.
>  931697038: system.cpu.[tid:0]: Reading float reg 4 (4) bits as 0, 0.
>  931697038: system.cpu.[tid:0]: Reading float reg 13 (13) bits as 0, 0.
> -931697038: system.cpu.[tid:0]: Reading float reg 12 (12) bits as 0, 0.
> +931697038: system.cpu.[tid:0]: Reading float reg 12 (12) bits as
> 0x80000000, -0.
>  931697038: global: FSR read as: 0xc0000000
>  931697038: system.cpu.[tid:0]: Setting float reg 18 (18) bits to 0, 0.
>  931697038: system.cpu.[tid:0]: Setting float reg 19 (19) bits to 0, 0.
> @@ -1381022 +1381022 @@
>  931697042: system.cpu.[tid:0]: Reading float reg 10 (10) bits as
> 0x41300000, 11.
>  931697042: global: FSR read as: 0xc0000000
>  931697042: system.cpu.[tid:0]: Setting float reg 16 (16) bits to
> 0x41300000, 11.
> -931697042: system.cpu.[tid:0]: Setting float reg 17 (17) bits to 0xe685,
> 8.26948e-41.
> +931697042: system.cpu.[tid:0]: Setting float reg 17 (17) bits to 0xe684,
> 8.26934e-41.
>  931697042: global: FSR written with: 0xc0000000
>  931697042: system.cpu + A16 T0 : 0xff1aa4a4    :       faddd   %f3,%f2,%f16
>      : FloatAdd :  D=0x00000000c0000000
>  931697042: Event_18: AtomicSimpleCPU tick event scheduled @ 931697043
>
> Could it be some kind of FP rounding error?  It's not clear how that would
> end up affecting %fp though.  (Actually, looking at this a little closer,
> are we even disassembling that correctly?  Seems to me it should be 'stdf
> %f29, [%fp + -0x20]'.)
>
> I won't have time to look into this further anytime soon, but I hope this
> will give someone else (Gabe?) enough to go on to get this figured out.
>
> Thanks,
>
> Steve
>
>
> On Sun, Oct 23, 2011 at 7:50 PM, Ali Saidi <[email protected]> wrote:
>
>> I've installed it.
>>
>> Ali
>>
>> On Oct 23, 2011, at 7:18 PM, Steve Reinhardt wrote:
>>
>>> This makes sense, since the time the regression started failing is
>>> consistent with when gcc was upgraded on zizzer.
>>>
>>> I see there is a gcc-4.4 package available for ubuntu 11.04 (which zizzer
>> is
>>> running)... is there more to it than installing that package and
>> recompiling
>>> to get a workable binary to run tracediff with?
>>>
>>> I'd try myself but I've forgotten my zizzer password (again!) so I can't
>>> sudo.  It's tough when you've had the same password for ten years then
>> you
>>> change it but don't use the new one much...
>>>
>>> Steve
>>>
>>> On Sun, Sep 25, 2011 at 1:14 PM, Ali Saidi <[email protected]> wrote:
>>>
>>>> Yes.. What Gabe said. With gcc 4.5 (version zizzer now runs) I cannot
>> find
>>>> a version of the repository that passes sparc boot.  I'm pretty sure
>> it's an
>>>> annoying compiler issue, but there are some annoyances is figuring out
>> where
>>>> to look at Gabe points out. If you're stats changes work on everything
>> else,
>>>> I'm happy to see them committed while this issue goes on in the
>> background.
>>>> Thanks,
>>>>
>>>> Ali
>>>>
>>>> Sent from my ARM powered device
>>>>
>>>> On Sep 25, 2011, at 3:06 PM, Gabe Black <[email protected]> wrote:
>>>>
>>>>> We (Ali and I) have each looked at that before, and we think it depends
>>>>> on the compiler version. Something changes when you have a new enough
>>>>> gcc and then the behavior of SPARC changes. I think the new behavior is
>>>>> broken and the old behavior is correct, but I'd have to look at it
>>>>> again. I haven't looked into it farther than that yet because I'd want
>>>>> to tracediff between versions built with different compilers. Since
>> they
>>>>> would need to find different versions of libraries and can't just run
>>>>> from the same command line, it's logistically annoying.
>>>>>
>>>>> Gabe
>>>>>
>>>>> On 09/25/11 09:52, nathan binkert wrote:
>>>>>> I'm trying to get my python stats changes into the tree, but it
>>>>>> appears that one of the regression tests no longer works (zizzer
>>>>>> agrees with me):
>>>>>>
>>>>>>
>> SPARC_FS/tests/opt/long/80.solaris-boot/sparc/solaris/t1000-simple-atomic
>>>>>>
>>>>>> Gabe, I think you're the only one that's been messing with SPARC.  Can
>>>>>> you take a look?
>>>>>>
>>>>>> Nate
>>>>>> _______________________________________________
>>>>>> gem5-dev mailing list
>>>>>> [email protected]
>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>> _______________________________________________
>>>>> gem5-dev mailing list
>>>>> [email protected]
>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>
>>>> _______________________________________________
>>>> gem5-dev mailing list
>>>> [email protected]
>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>
>>> _______________________________________________
>>> gem5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>
>> _______________________________________________
>> gem5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/gem5-dev
>>
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to