Here's a discussion on the gcc mailing list of the thing I was talking
about before that's supposed to fix this, I think.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678

Our barriers aren't working since Frs1s, Frs2s, and Frds could all be
registers.

Gabe

On 10/29/11 13:31, Gabe Black wrote:
> Here is some suspect assembly from Fadds for the atomic simple CPU
>
>    0x00000000008d538e <+382>:   callq  0x4cab70 <m5_fegetround>
>    0x00000000008d5393 <+387>:   mov    %eax,%r15d
>    0x00000000008d5396 <+390>:   mov    %r14d,%edi
>    0x00000000008d5399 <+393>:   callq  0x4cab30 <m5_fesetround>
>    0x00000000008d539e <+398>:   mov    %r15d,%edi
>    0x00000000008d53a1 <+401>:   callq  0x4cab30 <m5_fesetround>
>
>
> This is, more or less, from the following code.
>
>
>     __asm__ __volatile__ ("" ::: "memory");
>     int oldrnd = m5_fegetround();
>     __asm__ __volatile__ ("" ::: "memory");
>     m5_fesetround(newrnd);
>     __asm__ __volatile__ ("" ::: "memory");
> Frds = Frs1s + Frs2s;
>     __asm__ __volatile__ ("" ::: "memory");
>    m5_fesetround(oldrnd);
>     __asm__ __volatile__ ("" ::: "memory");
>
>
> Note that the addition was moved out of the middle and fesetround was
> called twice back to back, once to set the new rounding mode, and once
> to set it right back again.
>
> Gabe
>
> On 10/28/11 08:31, Ali Saidi wrote:
>> I'm still not 100% convinced that this is it. I agree it's highly
>> likely, but it could be some other code movement or a bug in the
>> optimizer (we have seen them before). I wonder if you can selectively
>> optimize functions. Maybe a good start is compiling everything -O3
>> except the atomic execute function and make sure it still works.
>>
>> Ali
>>
>>
>>
>> On Fri, 28 Oct 2011 07:38:59 -0700, Steve Reinhardt <[email protected]>
>> wrote:
>>> Yes, I think there exists at least one software IEEE FP
>>> implementation out
>>> there that we had talked about incorporating at some point (long ago).
>>>  Unfortunately, as is discussed below, that's not even the issue, as we
>>> really want to model the not-quite-IEEE (or in the case of x87,
>>> not-even-close) semantics of the hardware alone, which would require
>>> more
>>> effort.
>>>
>>> If someone really cared about modeling the ISA FP support precisely that
>>> would be an interesting project, and if it was done cleanly (probably
>>> with
>>> the option to turn it on or off) we'd be glad to incorporate it.
>>>
>>> Ironically I think the issue here is not that the HW FP is not good
>>> enough
>>> for our purposes, it's that the software stack doesn't give us clean
>>> enough
>>> access to the HW facilities (gcc in particular, though C itself may
>>> share
>>> part of the blame).
>>>
>>> Steve
>>>
>>> On Thu, Oct 27, 2011 at 11:36 PM, Gabe Black <[email protected]>
>>> wrote:
>>>
>>>> I think there was talk of an FP emulation library a long time ago
>>>> (before I was involved with M5) but we decided not to do something like
>>>> that for some reason. Using regular built in FP support gets us most of
>>>> the way with minimal hassle, but then there are situations like this
>>>> where it really causes trouble. I presume the prior discussion might
>>>> have been about whether getting most of the way there was good enough,
>>>> and that it's simpler.
>>>>
>>>> Gabe
>>>>
>>>> On 10/27/11 07:43, Radivoje Vasiljevic wrote:
>>>>> ----- Original Message ----- From: "Gabe Black"
>>>> <[email protected]>
>>>>> To: <[email protected]>
>>>>> Sent: 25. октобар 2011 20:53
>>>>> Subject: Re: [gem5-dev] Failed SPARC test
>>>>>
>>>>>
>>>>>> On 10/25/11 07:46, Steve Reinhardt wrote:
>>>>>>> On Tue, Oct 25, 2011 at 2:30 AM, Gabe Black <[email protected]>
>>>>>>> wrote:
>>>>> [snip]
>>>>>> Yeah, I think ISAs treat IEEE as a really good suggestion rather
>>>> than a
>>>>>> standard. ARM isn't strictly conformant, and neither is x86. The
>>>> default
>>>>>> rounding mode *is* standard, though, and I don't think is
>>>> adjusted in
>>>>>> SPARC as a result of execution. If it changed somehow (unless I'm
>>>>>> forgetting where SPARC does that) it's a fairly significant problem.
>>>>>> Whether instructions generate +/- 0 in various situations may
>>>> depend on,
>>>>>> for instance, what order gcc decides to put the operands. I'm not
>>>> sure
>>>>>> that it does, but there are all kinds of weird, subtle behaviors
>>>> with
>>>>>> FP, and you can't just fix how add works if x86 picked the wrong
>>>> thing.
>>>>>> Then you have to replace add, or semi-replace it by faking it out
>>>> with
>>>>>> other FP operations. If we're running real x87 instructions (we
>>>>>> shouldn't be in 64 bit mode, but we still could) then those use
>>>> 80 bit
>>>>>> operands internally. Where and when rounding takes place depends
>>>> on when
>>>>>> those are moved in/out of the FPU, and will be different than
>>>> true 64
>>>>>> bit operands. SSE based FP uses real 64 bit doubles, so that should
>>>>>> behave better. It should also be the default in 64 bit mode since
>>>> the
>>>>>> compiler can assume some basic SSE support is present.
>>>>>>
>>>>> What about FP emulation using integers and some kind of multiple
>>>>> precision
>>>>> arithmetic? Then every detail could be modeled, including x87
>>>> "floats"
>>>>> and
>>>>> "doubles" (in registers exponent field is still 15 bits, not 8/11 and
>>>>> makes
>>>>> mess of overflow/underflow, or it will go in memory and will be
>>>> proper
>>>>> float/double). Gcc has some switches regarding that behavior but
>>>> that is
>>>>> very fragile (more like suggestion to compiler then enforcing
>>>> option).
>>>>> Double rounding in x87 is special story because double extended
>>>>> mantissa is not more than twice longer then one for double so double
>>>>> rounding can give different results compared to single rounding (this
>>>>> situation can't happen
>>>>> with float vs double). One solution, for example: splitting mantissas
>>>>> into to halves and performing operation, all bits would be available
>>>>> and then proper any kind of rounding could be enforced (real ieee or
>>>>> "isa style ieee"). Performing those operations is not very slow
>>>> and it
>>>>> is fairly ILP reach so slowdown is not that great as when pure number
>>>>> of instructions is compared (although to have robust code, cpu and
>>>>> compiler independence, specially  about "optimizing code" some tests
>>>>> are needed to eradicate subnormals due poor support/trap emulation).
>>>>> Plus if instructions are mixed in right way both int and fpu units
>>>> can
>>>>> be kept busy. Exponent can be one short and problem solved. Only
>>>>> division can be somewhattricky (and slow), but it can be done too.
>>>>>
>>>>>
>>>>>>> Even if the FP rounding error isn't the source of the problem,
>>>> it might
>>>>>>> be
>>>>>>> easiest to fix that and get it out of the way so we can see what
>>>> the
>>>>>>> actual
>>>>>>> problem is.
>>>>>>>
>>>>>>> If you really want to know *why* the kernel is doing all this
>>>> FP, then
>>>>>>> yes,
>>>>>>> you probably need to look at the source code.
>>>>>>>
>>>>>>> Steve
>>>>>>> _______________________________________________
>>>>>>> gem5-dev mailing list
>>>>>>> [email protected]
>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>> _______________________________________________
>>>>>> gem5-dev mailing list
>>>>>> [email protected]
>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> gem5-dev mailing list
>>>>> [email protected]
>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>> _______________________________________________
>>>> gem5-dev mailing list
>>>> [email protected]
>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>
>>> _______________________________________________
>>> gem5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/gem5-dev
>> _______________________________________________
>> gem5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/gem5-dev
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to