I was just about to send a message about -frounding-math when I saw yours. 
Interesting that the asm barriers appears to work with ARM. It feels like there 
should be an explicit code motion barrier. Anyway, have we tried compiling with 
the -frounding-math flag? 



Ali

Sent from my ARM powered device

On Oct 29, 2011, at 3:44 PM, Gabe Black <[email protected]> wrote:

> Here's a discussion on the gcc mailing list of the thing I was talking
> about before that's supposed to fix this, I think.
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678
> 
> Our barriers aren't working since Frs1s, Frs2s, and Frds could all be
> registers.
> 
> Gabe
> 
> On 10/29/11 13:31, Gabe Black wrote:
>> Here is some suspect assembly from Fadds for the atomic simple CPU
>> 
>>   0x00000000008d538e <+382>:   callq  0x4cab70 <m5_fegetround>
>>   0x00000000008d5393 <+387>:   mov    %eax,%r15d
>>   0x00000000008d5396 <+390>:   mov    %r14d,%edi
>>   0x00000000008d5399 <+393>:   callq  0x4cab30 <m5_fesetround>
>>   0x00000000008d539e <+398>:   mov    %r15d,%edi
>>   0x00000000008d53a1 <+401>:   callq  0x4cab30 <m5_fesetround>
>> 
>> 
>> This is, more or less, from the following code.
>> 
>> 
>>    __asm__ __volatile__ ("" ::: "memory");
>>    int oldrnd = m5_fegetround();
>>    __asm__ __volatile__ ("" ::: "memory");
>>    m5_fesetround(newrnd);
>>    __asm__ __volatile__ ("" ::: "memory");
>> Frds = Frs1s + Frs2s;
>>    __asm__ __volatile__ ("" ::: "memory");
>>   m5_fesetround(oldrnd);
>>    __asm__ __volatile__ ("" ::: "memory");
>> 
>> 
>> Note that the addition was moved out of the middle and fesetround was
>> called twice back to back, once to set the new rounding mode, and once
>> to set it right back again.
>> 
>> Gabe
>> 
>> On 10/28/11 08:31, Ali Saidi wrote:
>>> I'm still not 100% convinced that this is it. I agree it's highly
>>> likely, but it could be some other code movement or a bug in the
>>> optimizer (we have seen them before). I wonder if you can selectively
>>> optimize functions. Maybe a good start is compiling everything -O3
>>> except the atomic execute function and make sure it still works.
>>> 
>>> Ali
>>> 
>>> 
>>> 
>>> On Fri, 28 Oct 2011 07:38:59 -0700, Steve Reinhardt <[email protected]>
>>> wrote:
>>>> Yes, I think there exists at least one software IEEE FP
>>>> implementation out
>>>> there that we had talked about incorporating at some point (long ago).
>>>> Unfortunately, as is discussed below, that's not even the issue, as we
>>>> really want to model the not-quite-IEEE (or in the case of x87,
>>>> not-even-close) semantics of the hardware alone, which would require
>>>> more
>>>> effort.
>>>> 
>>>> If someone really cared about modeling the ISA FP support precisely that
>>>> would be an interesting project, and if it was done cleanly (probably
>>>> with
>>>> the option to turn it on or off) we'd be glad to incorporate it.
>>>> 
>>>> Ironically I think the issue here is not that the HW FP is not good
>>>> enough
>>>> for our purposes, it's that the software stack doesn't give us clean
>>>> enough
>>>> access to the HW facilities (gcc in particular, though C itself may
>>>> share
>>>> part of the blame).
>>>> 
>>>> Steve
>>>> 
>>>> On Thu, Oct 27, 2011 at 11:36 PM, Gabe Black <[email protected]>
>>>> wrote:
>>>> 
>>>>> I think there was talk of an FP emulation library a long time ago
>>>>> (before I was involved with M5) but we decided not to do something like
>>>>> that for some reason. Using regular built in FP support gets us most of
>>>>> the way with minimal hassle, but then there are situations like this
>>>>> where it really causes trouble. I presume the prior discussion might
>>>>> have been about whether getting most of the way there was good enough,
>>>>> and that it's simpler.
>>>>> 
>>>>> Gabe
>>>>> 
>>>>> On 10/27/11 07:43, Radivoje Vasiljevic wrote:
>>>>>> ----- Original Message ----- From: "Gabe Black"
>>>>> <[email protected]>
>>>>>> To: <[email protected]>
>>>>>> Sent: 25. октобар 2011 20:53
>>>>>> Subject: Re: [gem5-dev] Failed SPARC test
>>>>>> 
>>>>>> 
>>>>>>> On 10/25/11 07:46, Steve Reinhardt wrote:
>>>>>>>> On Tue, Oct 25, 2011 at 2:30 AM, Gabe Black <[email protected]>
>>>>>>>> wrote:
>>>>>> [snip]
>>>>>>> Yeah, I think ISAs treat IEEE as a really good suggestion rather
>>>>> than a
>>>>>>> standard. ARM isn't strictly conformant, and neither is x86. The
>>>>> default
>>>>>>> rounding mode *is* standard, though, and I don't think is
>>>>> adjusted in
>>>>>>> SPARC as a result of execution. If it changed somehow (unless I'm
>>>>>>> forgetting where SPARC does that) it's a fairly significant problem.
>>>>>>> Whether instructions generate +/- 0 in various situations may
>>>>> depend on,
>>>>>>> for instance, what order gcc decides to put the operands. I'm not
>>>>> sure
>>>>>>> that it does, but there are all kinds of weird, subtle behaviors
>>>>> with
>>>>>>> FP, and you can't just fix how add works if x86 picked the wrong
>>>>> thing.
>>>>>>> Then you have to replace add, or semi-replace it by faking it out
>>>>> with
>>>>>>> other FP operations. If we're running real x87 instructions (we
>>>>>>> shouldn't be in 64 bit mode, but we still could) then those use
>>>>> 80 bit
>>>>>>> operands internally. Where and when rounding takes place depends
>>>>> on when
>>>>>>> those are moved in/out of the FPU, and will be different than
>>>>> true 64
>>>>>>> bit operands. SSE based FP uses real 64 bit doubles, so that should
>>>>>>> behave better. It should also be the default in 64 bit mode since
>>>>> the
>>>>>>> compiler can assume some basic SSE support is present.
>>>>>>> 
>>>>>> What about FP emulation using integers and some kind of multiple
>>>>>> precision
>>>>>> arithmetic? Then every detail could be modeled, including x87
>>>>> "floats"
>>>>>> and
>>>>>> "doubles" (in registers exponent field is still 15 bits, not 8/11 and
>>>>>> makes
>>>>>> mess of overflow/underflow, or it will go in memory and will be
>>>>> proper
>>>>>> float/double). Gcc has some switches regarding that behavior but
>>>>> that is
>>>>>> very fragile (more like suggestion to compiler then enforcing
>>>>> option).
>>>>>> Double rounding in x87 is special story because double extended
>>>>>> mantissa is not more than twice longer then one for double so double
>>>>>> rounding can give different results compared to single rounding (this
>>>>>> situation can't happen
>>>>>> with float vs double). One solution, for example: splitting mantissas
>>>>>> into to halves and performing operation, all bits would be available
>>>>>> and then proper any kind of rounding could be enforced (real ieee or
>>>>>> "isa style ieee"). Performing those operations is not very slow
>>>>> and it
>>>>>> is fairly ILP reach so slowdown is not that great as when pure number
>>>>>> of instructions is compared (although to have robust code, cpu and
>>>>>> compiler independence, specially  about "optimizing code" some tests
>>>>>> are needed to eradicate subnormals due poor support/trap emulation).
>>>>>> Plus if instructions are mixed in right way both int and fpu units
>>>>> can
>>>>>> be kept busy. Exponent can be one short and problem solved. Only
>>>>>> division can be somewhattricky (and slow), but it can be done too.
>>>>>> 
>>>>>> 
>>>>>>>> Even if the FP rounding error isn't the source of the problem,
>>>>> it might
>>>>>>>> be
>>>>>>>> easiest to fix that and get it out of the way so we can see what
>>>>> the
>>>>>>>> actual
>>>>>>>> problem is.
>>>>>>>> 
>>>>>>>> If you really want to know *why* the kernel is doing all this
>>>>> FP, then
>>>>>>>> yes,
>>>>>>>> you probably need to look at the source code.
>>>>>>>> 
>>>>>>>> Steve
>>>>>>>> _______________________________________________
>>>>>>>> gem5-dev mailing list
>>>>>>>> [email protected]
>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>> _______________________________________________
>>>>>>> gem5-dev mailing list
>>>>>>> [email protected]
>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> gem5-dev mailing list
>>>>>> [email protected]
>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>> _______________________________________________
>>>>> gem5-dev mailing list
>>>>> [email protected]
>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>> 
>>>> _______________________________________________
>>>> gem5-dev mailing list
>>>> [email protected]
>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>> _______________________________________________
>>> gem5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/gem5-dev
>> _______________________________________________
>> gem5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/gem5-dev
> 
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to