http://permalink.gmane.org/gmane.comp.gcc.help/38146

On 10/29/11 14:21, Gabe Black wrote:
> Yes, it doesn't work either. What makes the ARM asm statements work is
> that they have input and output arguments. That ties them into the data
> flow graph having to do with those values, and they act as anchors,
> forcing values to be produced by the time you get to the asm and not to
> be consumed before it. Here we're just saying not to trust memory from
> before the asm, and since it's not *in* memory, the compiler merrily
> ignores us. I had this problem with ARM initially too until I added the
> arguments. I've tried making floating point variables volatile to ensure
> they're in memory, and that doesn't work either. I think the actual
> semantics of volatile are a little different than what most people
> assume, although I don't remember what the distinction is. One option
> might be to make the FP operation itself a virtual function. Then gcc
> won't know what it does and will be less able to break things by moving
> things around.
>
> It seems like a pretty severe deficiency of gcc that there's no way to
> make fesetround work properly. It becomes nearly worthless because you
> can't make any assumptions about when it will actually be in effect.
> That's what we have to work with, though.
>
> Gabe
>
> On 10/29/11 13:53, Ali Saidi wrote:
>> I was just about to send a message about -frounding-math when I saw yours. 
>> Interesting that the asm barriers appears to work with ARM. It feels like 
>> there should be an explicit code motion barrier. Anyway, have we tried 
>> compiling with the -frounding-math flag? 
>>
>>
>>
>> Ali
>>
>> Sent from my ARM powered device
>>
>> On Oct 29, 2011, at 3:44 PM, Gabe Black <[email protected]> wrote:
>>
>>> Here's a discussion on the gcc mailing list of the thing I was talking
>>> about before that's supposed to fix this, I think.
>>>
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678
>>>
>>> Our barriers aren't working since Frs1s, Frs2s, and Frds could all be
>>> registers.
>>>
>>> Gabe
>>>
>>> On 10/29/11 13:31, Gabe Black wrote:
>>>> Here is some suspect assembly from Fadds for the atomic simple CPU
>>>>
>>>>   0x00000000008d538e <+382>:   callq  0x4cab70 <m5_fegetround>
>>>>   0x00000000008d5393 <+387>:   mov    %eax,%r15d
>>>>   0x00000000008d5396 <+390>:   mov    %r14d,%edi
>>>>   0x00000000008d5399 <+393>:   callq  0x4cab30 <m5_fesetround>
>>>>   0x00000000008d539e <+398>:   mov    %r15d,%edi
>>>>   0x00000000008d53a1 <+401>:   callq  0x4cab30 <m5_fesetround>
>>>>
>>>>
>>>> This is, more or less, from the following code.
>>>>
>>>>
>>>>    __asm__ __volatile__ ("" ::: "memory");
>>>>    int oldrnd = m5_fegetround();
>>>>    __asm__ __volatile__ ("" ::: "memory");
>>>>    m5_fesetround(newrnd);
>>>>    __asm__ __volatile__ ("" ::: "memory");
>>>> Frds = Frs1s + Frs2s;
>>>>    __asm__ __volatile__ ("" ::: "memory");
>>>>   m5_fesetround(oldrnd);
>>>>    __asm__ __volatile__ ("" ::: "memory");
>>>>
>>>>
>>>> Note that the addition was moved out of the middle and fesetround was
>>>> called twice back to back, once to set the new rounding mode, and once
>>>> to set it right back again.
>>>>
>>>> Gabe
>>>>
>>>> On 10/28/11 08:31, Ali Saidi wrote:
>>>>> I'm still not 100% convinced that this is it. I agree it's highly
>>>>> likely, but it could be some other code movement or a bug in the
>>>>> optimizer (we have seen them before). I wonder if you can selectively
>>>>> optimize functions. Maybe a good start is compiling everything -O3
>>>>> except the atomic execute function and make sure it still works.
>>>>>
>>>>> Ali
>>>>>
>>>>>
>>>>>
>>>>> On Fri, 28 Oct 2011 07:38:59 -0700, Steve Reinhardt <[email protected]>
>>>>> wrote:
>>>>>> Yes, I think there exists at least one software IEEE FP
>>>>>> implementation out
>>>>>> there that we had talked about incorporating at some point (long ago).
>>>>>> Unfortunately, as is discussed below, that's not even the issue, as we
>>>>>> really want to model the not-quite-IEEE (or in the case of x87,
>>>>>> not-even-close) semantics of the hardware alone, which would require
>>>>>> more
>>>>>> effort.
>>>>>>
>>>>>> If someone really cared about modeling the ISA FP support precisely that
>>>>>> would be an interesting project, and if it was done cleanly (probably
>>>>>> with
>>>>>> the option to turn it on or off) we'd be glad to incorporate it.
>>>>>>
>>>>>> Ironically I think the issue here is not that the HW FP is not good
>>>>>> enough
>>>>>> for our purposes, it's that the software stack doesn't give us clean
>>>>>> enough
>>>>>> access to the HW facilities (gcc in particular, though C itself may
>>>>>> share
>>>>>> part of the blame).
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>> On Thu, Oct 27, 2011 at 11:36 PM, Gabe Black <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I think there was talk of an FP emulation library a long time ago
>>>>>>> (before I was involved with M5) but we decided not to do something like
>>>>>>> that for some reason. Using regular built in FP support gets us most of
>>>>>>> the way with minimal hassle, but then there are situations like this
>>>>>>> where it really causes trouble. I presume the prior discussion might
>>>>>>> have been about whether getting most of the way there was good enough,
>>>>>>> and that it's simpler.
>>>>>>>
>>>>>>> Gabe
>>>>>>>
>>>>>>> On 10/27/11 07:43, Radivoje Vasiljevic wrote:
>>>>>>>> ----- Original Message ----- From: "Gabe Black"
>>>>>>> <[email protected]>
>>>>>>>> To: <[email protected]>
>>>>>>>> Sent: 25. октобар 2011 20:53
>>>>>>>> Subject: Re: [gem5-dev] Failed SPARC test
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 10/25/11 07:46, Steve Reinhardt wrote:
>>>>>>>>>> On Tue, Oct 25, 2011 at 2:30 AM, Gabe Black <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>> [snip]
>>>>>>>>> Yeah, I think ISAs treat IEEE as a really good suggestion rather
>>>>>>> than a
>>>>>>>>> standard. ARM isn't strictly conformant, and neither is x86. The
>>>>>>> default
>>>>>>>>> rounding mode *is* standard, though, and I don't think is
>>>>>>> adjusted in
>>>>>>>>> SPARC as a result of execution. If it changed somehow (unless I'm
>>>>>>>>> forgetting where SPARC does that) it's a fairly significant problem.
>>>>>>>>> Whether instructions generate +/- 0 in various situations may
>>>>>>> depend on,
>>>>>>>>> for instance, what order gcc decides to put the operands. I'm not
>>>>>>> sure
>>>>>>>>> that it does, but there are all kinds of weird, subtle behaviors
>>>>>>> with
>>>>>>>>> FP, and you can't just fix how add works if x86 picked the wrong
>>>>>>> thing.
>>>>>>>>> Then you have to replace add, or semi-replace it by faking it out
>>>>>>> with
>>>>>>>>> other FP operations. If we're running real x87 instructions (we
>>>>>>>>> shouldn't be in 64 bit mode, but we still could) then those use
>>>>>>> 80 bit
>>>>>>>>> operands internally. Where and when rounding takes place depends
>>>>>>> on when
>>>>>>>>> those are moved in/out of the FPU, and will be different than
>>>>>>> true 64
>>>>>>>>> bit operands. SSE based FP uses real 64 bit doubles, so that should
>>>>>>>>> behave better. It should also be the default in 64 bit mode since
>>>>>>> the
>>>>>>>>> compiler can assume some basic SSE support is present.
>>>>>>>>>
>>>>>>>> What about FP emulation using integers and some kind of multiple
>>>>>>>> precision
>>>>>>>> arithmetic? Then every detail could be modeled, including x87
>>>>>>> "floats"
>>>>>>>> and
>>>>>>>> "doubles" (in registers exponent field is still 15 bits, not 8/11 and
>>>>>>>> makes
>>>>>>>> mess of overflow/underflow, or it will go in memory and will be
>>>>>>> proper
>>>>>>>> float/double). Gcc has some switches regarding that behavior but
>>>>>>> that is
>>>>>>>> very fragile (more like suggestion to compiler then enforcing
>>>>>>> option).
>>>>>>>> Double rounding in x87 is special story because double extended
>>>>>>>> mantissa is not more than twice longer then one for double so double
>>>>>>>> rounding can give different results compared to single rounding (this
>>>>>>>> situation can't happen
>>>>>>>> with float vs double). One solution, for example: splitting mantissas
>>>>>>>> into to halves and performing operation, all bits would be available
>>>>>>>> and then proper any kind of rounding could be enforced (real ieee or
>>>>>>>> "isa style ieee"). Performing those operations is not very slow
>>>>>>> and it
>>>>>>>> is fairly ILP reach so slowdown is not that great as when pure number
>>>>>>>> of instructions is compared (although to have robust code, cpu and
>>>>>>>> compiler independence, specially  about "optimizing code" some tests
>>>>>>>> are needed to eradicate subnormals due poor support/trap emulation).
>>>>>>>> Plus if instructions are mixed in right way both int and fpu units
>>>>>>> can
>>>>>>>> be kept busy. Exponent can be one short and problem solved. Only
>>>>>>>> division can be somewhattricky (and slow), but it can be done too.
>>>>>>>>
>>>>>>>>
>>>>>>>>>> Even if the FP rounding error isn't the source of the problem,
>>>>>>> it might
>>>>>>>>>> be
>>>>>>>>>> easiest to fix that and get it out of the way so we can see what
>>>>>>> the
>>>>>>>>>> actual
>>>>>>>>>> problem is.
>>>>>>>>>>
>>>>>>>>>> If you really want to know *why* the kernel is doing all this
>>>>>>> FP, then
>>>>>>>>>> yes,
>>>>>>>>>> you probably need to look at the source code.
>>>>>>>>>>
>>>>>>>>>> Steve
>>>>>>>>>> _______________________________________________
>>>>>>>>>> gem5-dev mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>>>> _______________________________________________
>>>>>>>>> gem5-dev mailing list
>>>>>>>>> [email protected]
>>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> gem5-dev mailing list
>>>>>>>> [email protected]
>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>> _______________________________________________
>>>>>>> gem5-dev mailing list
>>>>>>> [email protected]
>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>>
>>>>>> _______________________________________________
>>>>>> gem5-dev mailing list
>>>>>> [email protected]
>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>> _______________________________________________
>>>>> gem5-dev mailing list
>>>>> [email protected]
>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>> _______________________________________________
>>>> gem5-dev mailing list
>>>> [email protected]
>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>> _______________________________________________
>>> gem5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/gem5-dev
>> _______________________________________________
>> gem5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/gem5-dev
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to