Re: [gem5-dev] Failed SPARC test

Ali Saidi Sat, 29 Oct 2011 16:30:40 -0700

What about making m5_fesetround and m5_fegetround() modify memory and thus 
prevent reordering?


Something like:

volatile int dummy_compiler;

void m5_fesetround(int rm)
{
    assert(rm >= 0 && rm < 4);
    dummy_compiler++;
    fesetround(m5_round_ops[rm]);
    dummy_compiler++;
}

int m5_fegetround()
{
    int x;
    dummy_compiler++;
    int rm = fegetround();
    dummy_compiler++;
    for(x = 0; x < 4; x++)
        if (m5_round_ops[x] == rm)
            return x;
    abort();
    return 0;
}

Would that just fix it? Mabye m5_round_ops and rm could be made volatile 
instead?

Another possible solution and hack, but I think we're into hack territory no 
matter what since gcc seems brain damaged in this regard:

#if __GNUC__ > 3 && __GNUC_MINOR__  > 3 // 4.4 or newer
#pragma GCC push_options
#pragma GCC optimize ("O0")

// m5_fe* goes here

#pragma GCC pop_options
#endif 


A third option would be something like

void __attribute__((optimize("O0")) m5_fesetround(int rm)...

Ali


On Oct 29, 2011, at 4:59 PM, Gabe Black wrote:

> http://permalink.gmane.org/gmane.comp.gcc.help/38146
> 
> On 10/29/11 14:21, Gabe Black wrote:
>> Yes, it doesn't work either. What makes the ARM asm statements work is
>> that they have input and output arguments. That ties them into the data
>> flow graph having to do with those values, and they act as anchors,
>> forcing values to be produced by the time you get to the asm and not to
>> be consumed before it. Here we're just saying not to trust memory from
>> before the asm, and since it's not *in* memory, the compiler merrily
>> ignores us. I had this problem with ARM initially too until I added the
>> arguments. I've tried making floating point variables volatile to ensure
>> they're in memory, and that doesn't work either. I think the actual
>> semantics of volatile are a little different than what most people
>> assume, although I don't remember what the distinction is. One option
>> might be to make the FP operation itself a virtual function. Then gcc
>> won't know what it does and will be less able to break things by moving
>> things around.
>> 
>> It seems like a pretty severe deficiency of gcc that there's no way to
>> make fesetround work properly. It becomes nearly worthless because you
>> can't make any assumptions about when it will actually be in effect.
>> That's what we have to work with, though.
>> 
>> Gabe
>> 
>> On 10/29/11 13:53, Ali Saidi wrote:
>>> I was just about to send a message about -frounding-math when I saw yours. 
>>> Interesting that the asm barriers appears to work with ARM. It feels like 
>>> there should be an explicit code motion barrier. Anyway, have we tried 
>>> compiling with the -frounding-math flag? 
>>> 
>>> 
>>> 
>>> Ali
>>> 
>>> Sent from my ARM powered device
>>> 
>>> On Oct 29, 2011, at 3:44 PM, Gabe Black <[email protected]> wrote:
>>> 
>>>> Here's a discussion on the gcc mailing list of the thing I was talking
>>>> about before that's supposed to fix this, I think.
>>>> 
>>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678
>>>> 
>>>> Our barriers aren't working since Frs1s, Frs2s, and Frds could all be
>>>> registers.
>>>> 
>>>> Gabe
>>>> 
>>>> On 10/29/11 13:31, Gabe Black wrote:
>>>>> Here is some suspect assembly from Fadds for the atomic simple CPU
>>>>> 
>>>>>  0x00000000008d538e <+382>:   callq  0x4cab70 <m5_fegetround>
>>>>>  0x00000000008d5393 <+387>:   mov    %eax,%r15d
>>>>>  0x00000000008d5396 <+390>:   mov    %r14d,%edi
>>>>>  0x00000000008d5399 <+393>:   callq  0x4cab30 <m5_fesetround>
>>>>>  0x00000000008d539e <+398>:   mov    %r15d,%edi
>>>>>  0x00000000008d53a1 <+401>:   callq  0x4cab30 <m5_fesetround>
>>>>> 
>>>>> 
>>>>> This is, more or less, from the following code.
>>>>> 
>>>>> 
>>>>>   __asm__ __volatile__ ("" ::: "memory");
>>>>>   int oldrnd = m5_fegetround();
>>>>>   __asm__ __volatile__ ("" ::: "memory");
>>>>>   m5_fesetround(newrnd);
>>>>>   __asm__ __volatile__ ("" ::: "memory");
>>>>> Frds = Frs1s + Frs2s;
>>>>>   __asm__ __volatile__ ("" ::: "memory");
>>>>>  m5_fesetround(oldrnd);
>>>>>   __asm__ __volatile__ ("" ::: "memory");
>>>>> 
>>>>> 
>>>>> Note that the addition was moved out of the middle and fesetround was
>>>>> called twice back to back, once to set the new rounding mode, and once
>>>>> to set it right back again.
>>>>> 
>>>>> Gabe
>>>>> 
>>>>> On 10/28/11 08:31, Ali Saidi wrote:
>>>>>> I'm still not 100% convinced that this is it. I agree it's highly
>>>>>> likely, but it could be some other code movement or a bug in the
>>>>>> optimizer (we have seen them before). I wonder if you can selectively
>>>>>> optimize functions. Maybe a good start is compiling everything -O3
>>>>>> except the atomic execute function and make sure it still works.
>>>>>> 
>>>>>> Ali
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Fri, 28 Oct 2011 07:38:59 -0700, Steve Reinhardt <[email protected]>
>>>>>> wrote:
>>>>>>> Yes, I think there exists at least one software IEEE FP
>>>>>>> implementation out
>>>>>>> there that we had talked about incorporating at some point (long ago).
>>>>>>> Unfortunately, as is discussed below, that's not even the issue, as we
>>>>>>> really want to model the not-quite-IEEE (or in the case of x87,
>>>>>>> not-even-close) semantics of the hardware alone, which would require
>>>>>>> more
>>>>>>> effort.
>>>>>>> 
>>>>>>> If someone really cared about modeling the ISA FP support precisely that
>>>>>>> would be an interesting project, and if it was done cleanly (probably
>>>>>>> with
>>>>>>> the option to turn it on or off) we'd be glad to incorporate it.
>>>>>>> 
>>>>>>> Ironically I think the issue here is not that the HW FP is not good
>>>>>>> enough
>>>>>>> for our purposes, it's that the software stack doesn't give us clean
>>>>>>> enough
>>>>>>> access to the HW facilities (gcc in particular, though C itself may
>>>>>>> share
>>>>>>> part of the blame).
>>>>>>> 
>>>>>>> Steve
>>>>>>> 
>>>>>>> On Thu, Oct 27, 2011 at 11:36 PM, Gabe Black <[email protected]>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> I think there was talk of an FP emulation library a long time ago
>>>>>>>> (before I was involved with M5) but we decided not to do something like
>>>>>>>> that for some reason. Using regular built in FP support gets us most of
>>>>>>>> the way with minimal hassle, but then there are situations like this
>>>>>>>> where it really causes trouble. I presume the prior discussion might
>>>>>>>> have been about whether getting most of the way there was good enough,
>>>>>>>> and that it's simpler.
>>>>>>>> 
>>>>>>>> Gabe
>>>>>>>> 
>>>>>>>> On 10/27/11 07:43, Radivoje Vasiljevic wrote:
>>>>>>>>> ----- Original Message ----- From: "Gabe Black"
>>>>>>>> <[email protected]>
>>>>>>>>> To: <[email protected]>
>>>>>>>>> Sent: 25. октобар 2011 20:53
>>>>>>>>> Subject: Re: [gem5-dev] Failed SPARC test
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 10/25/11 07:46, Steve Reinhardt wrote:
>>>>>>>>>>> On Tue, Oct 25, 2011 at 2:30 AM, Gabe Black <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>> [snip]
>>>>>>>>>> Yeah, I think ISAs treat IEEE as a really good suggestion rather
>>>>>>>> than a
>>>>>>>>>> standard. ARM isn't strictly conformant, and neither is x86. The
>>>>>>>> default
>>>>>>>>>> rounding mode *is* standard, though, and I don't think is
>>>>>>>> adjusted in
>>>>>>>>>> SPARC as a result of execution. If it changed somehow (unless I'm
>>>>>>>>>> forgetting where SPARC does that) it's a fairly significant problem.
>>>>>>>>>> Whether instructions generate +/- 0 in various situations may
>>>>>>>> depend on,
>>>>>>>>>> for instance, what order gcc decides to put the operands. I'm not
>>>>>>>> sure
>>>>>>>>>> that it does, but there are all kinds of weird, subtle behaviors
>>>>>>>> with
>>>>>>>>>> FP, and you can't just fix how add works if x86 picked the wrong
>>>>>>>> thing.
>>>>>>>>>> Then you have to replace add, or semi-replace it by faking it out
>>>>>>>> with
>>>>>>>>>> other FP operations. If we're running real x87 instructions (we
>>>>>>>>>> shouldn't be in 64 bit mode, but we still could) then those use
>>>>>>>> 80 bit
>>>>>>>>>> operands internally. Where and when rounding takes place depends
>>>>>>>> on when
>>>>>>>>>> those are moved in/out of the FPU, and will be different than
>>>>>>>> true 64
>>>>>>>>>> bit operands. SSE based FP uses real 64 bit doubles, so that should
>>>>>>>>>> behave better. It should also be the default in 64 bit mode since
>>>>>>>> the
>>>>>>>>>> compiler can assume some basic SSE support is present.
>>>>>>>>>> 
>>>>>>>>> What about FP emulation using integers and some kind of multiple
>>>>>>>>> precision
>>>>>>>>> arithmetic? Then every detail could be modeled, including x87
>>>>>>>> "floats"
>>>>>>>>> and
>>>>>>>>> "doubles" (in registers exponent field is still 15 bits, not 8/11 and
>>>>>>>>> makes
>>>>>>>>> mess of overflow/underflow, or it will go in memory and will be
>>>>>>>> proper
>>>>>>>>> float/double). Gcc has some switches regarding that behavior but
>>>>>>>> that is
>>>>>>>>> very fragile (more like suggestion to compiler then enforcing
>>>>>>>> option).
>>>>>>>>> Double rounding in x87 is special story because double extended
>>>>>>>>> mantissa is not more than twice longer then one for double so double
>>>>>>>>> rounding can give different results compared to single rounding (this
>>>>>>>>> situation can't happen
>>>>>>>>> with float vs double). One solution, for example: splitting mantissas
>>>>>>>>> into to halves and performing operation, all bits would be available
>>>>>>>>> and then proper any kind of rounding could be enforced (real ieee or
>>>>>>>>> "isa style ieee"). Performing those operations is not very slow
>>>>>>>> and it
>>>>>>>>> is fairly ILP reach so slowdown is not that great as when pure number
>>>>>>>>> of instructions is compared (although to have robust code, cpu and
>>>>>>>>> compiler independence, specially  about "optimizing code" some tests
>>>>>>>>> are needed to eradicate subnormals due poor support/trap emulation).
>>>>>>>>> Plus if instructions are mixed in right way both int and fpu units
>>>>>>>> can
>>>>>>>>> be kept busy. Exponent can be one short and problem solved. Only
>>>>>>>>> division can be somewhattricky (and slow), but it can be done too.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>>> Even if the FP rounding error isn't the source of the problem,
>>>>>>>> it might
>>>>>>>>>>> be
>>>>>>>>>>> easiest to fix that and get it out of the way so we can see what
>>>>>>>> the
>>>>>>>>>>> actual
>>>>>>>>>>> problem is.
>>>>>>>>>>> 
>>>>>>>>>>> If you really want to know *why* the kernel is doing all this
>>>>>>>> FP, then
>>>>>>>>>>> yes,
>>>>>>>>>>> you probably need to look at the source code.
>>>>>>>>>>> 
>>>>>>>>>>> Steve
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> gem5-dev mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>>>>> _______________________________________________
>>>>>>>>>> gem5-dev mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> gem5-dev mailing list
>>>>>>>>> [email protected]
>>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>>> _______________________________________________
>>>>>>>> gem5-dev mailing list
>>>>>>>> [email protected]
>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> gem5-dev mailing list
>>>>>>> [email protected]
>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>> _______________________________________________
>>>>>> gem5-dev mailing list
>>>>>> [email protected]
>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>> _______________________________________________
>>>>> gem5-dev mailing list
>>>>> [email protected]
>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>> _______________________________________________
>>>> gem5-dev mailing list
>>>> [email protected]
>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>> _______________________________________________
>>> gem5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/gem5-dev
>> _______________________________________________
>> gem5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/gem5-dev
> 
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Failed SPARC test

Reply via email to