Re: [gem5-dev] Failed SPARC test

Gabe Black Sat, 29 Oct 2011 16:51:29 -0700

I don't think either will work because it's not the optimizations in
those functions or the functions order relative to each other or the
asms, it's the position of the add relative to the asms. Since the add
can move around wherever, it doesn't matter if the calls to fesetround
are bounded by the asms. We could potentially mark the execute function
with a different optimization level though. That might work. Also, I
have that filterDoubles function in there that finds fp operands that
are doubles and builds them from or breaks them down into single floats.
We could possibly piggyback on that to add in asms with the right
properties like in ARM. It's a bit gross, but like you said I don't know
if we can avoid that.


Gabe

On 10/29/11 16:31, Ali Saidi wrote:
> If we go down the path below, slighly less hacky might be just making gcc 
> compiler the entire fenv file without optimization, although perhaps that is 
> insufficient....
>
> Ali
>
> On Oct 29, 2011, at 6:30 PM, Ali Saidi wrote:
>
>> What about making m5_fesetround and m5_fegetround() modify memory and thus 
>> prevent reordering?
>>
>> Something like:
>>
>> volatile int dummy_compiler;
>>
>> void m5_fesetround(int rm)
>> {
>>    assert(rm >= 0 && rm < 4);
>>    dummy_compiler++;
>>    fesetround(m5_round_ops[rm]);
>>    dummy_compiler++;
>> }
>>
>> int m5_fegetround()
>> {
>>    int x;
>>    dummy_compiler++;
>>    int rm = fegetround();
>>    dummy_compiler++;
>>    for(x = 0; x < 4; x++)
>>        if (m5_round_ops[x] == rm)
>>            return x;
>>    abort();
>>    return 0;
>> }
>>
>> Would that just fix it? Mabye m5_round_ops and rm could be made volatile 
>> instead?
>>
>> Another possible solution and hack, but I think we're into hack territory no 
>> matter what since gcc seems brain damaged in this regard:
>>
>> #if __GNUC__ > 3 && __GNUC_MINOR__  > 3 // 4.4 or newer
>> #pragma GCC push_options
>> #pragma GCC optimize ("O0")
>>
>> // m5_fe* goes here
>>
>> #pragma GCC pop_options
>> #endif 
>>
>>
>> A third option would be something like
>>
>> void __attribute__((optimize("O0")) m5_fesetround(int rm)...
>>
>> Ali
>>
>>
>> On Oct 29, 2011, at 4:59 PM, Gabe Black wrote:
>>
>>> http://permalink.gmane.org/gmane.comp.gcc.help/38146
>>>
>>> On 10/29/11 14:21, Gabe Black wrote:
>>>> Yes, it doesn't work either. What makes the ARM asm statements work is
>>>> that they have input and output arguments. That ties them into the data
>>>> flow graph having to do with those values, and they act as anchors,
>>>> forcing values to be produced by the time you get to the asm and not to
>>>> be consumed before it. Here we're just saying not to trust memory from
>>>> before the asm, and since it's not *in* memory, the compiler merrily
>>>> ignores us. I had this problem with ARM initially too until I added the
>>>> arguments. I've tried making floating point variables volatile to ensure
>>>> they're in memory, and that doesn't work either. I think the actual
>>>> semantics of volatile are a little different than what most people
>>>> assume, although I don't remember what the distinction is. One option
>>>> might be to make the FP operation itself a virtual function. Then gcc
>>>> won't know what it does and will be less able to break things by moving
>>>> things around.
>>>>
>>>> It seems like a pretty severe deficiency of gcc that there's no way to
>>>> make fesetround work properly. It becomes nearly worthless because you
>>>> can't make any assumptions about when it will actually be in effect.
>>>> That's what we have to work with, though.
>>>>
>>>> Gabe
>>>>
>>>> On 10/29/11 13:53, Ali Saidi wrote:
>>>>> I was just about to send a message about -frounding-math when I saw 
>>>>> yours. Interesting that the asm barriers appears to work with ARM. It 
>>>>> feels like there should be an explicit code motion barrier. Anyway, have 
>>>>> we tried compiling with the -frounding-math flag? 
>>>>>
>>>>>
>>>>>
>>>>> Ali
>>>>>
>>>>> Sent from my ARM powered device
>>>>>
>>>>> On Oct 29, 2011, at 3:44 PM, Gabe Black <[email protected]> wrote:
>>>>>
>>>>>> Here's a discussion on the gcc mailing list of the thing I was talking
>>>>>> about before that's supposed to fix this, I think.
>>>>>>
>>>>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678
>>>>>>
>>>>>> Our barriers aren't working since Frs1s, Frs2s, and Frds could all be
>>>>>> registers.
>>>>>>
>>>>>> Gabe
>>>>>>
>>>>>> On 10/29/11 13:31, Gabe Black wrote:
>>>>>>> Here is some suspect assembly from Fadds for the atomic simple CPU
>>>>>>>
>>>>>>> 0x00000000008d538e <+382>:   callq  0x4cab70 <m5_fegetround>
>>>>>>> 0x00000000008d5393 <+387>:   mov    %eax,%r15d
>>>>>>> 0x00000000008d5396 <+390>:   mov    %r14d,%edi
>>>>>>> 0x00000000008d5399 <+393>:   callq  0x4cab30 <m5_fesetround>
>>>>>>> 0x00000000008d539e <+398>:   mov    %r15d,%edi
>>>>>>> 0x00000000008d53a1 <+401>:   callq  0x4cab30 <m5_fesetround>
>>>>>>>
>>>>>>>
>>>>>>> This is, more or less, from the following code.
>>>>>>>
>>>>>>>
>>>>>>>  __asm__ __volatile__ ("" ::: "memory");
>>>>>>>  int oldrnd = m5_fegetround();
>>>>>>>  __asm__ __volatile__ ("" ::: "memory");
>>>>>>>  m5_fesetround(newrnd);
>>>>>>>  __asm__ __volatile__ ("" ::: "memory");
>>>>>>> Frds = Frs1s + Frs2s;
>>>>>>>  __asm__ __volatile__ ("" ::: "memory");
>>>>>>> m5_fesetround(oldrnd);
>>>>>>>  __asm__ __volatile__ ("" ::: "memory");
>>>>>>>
>>>>>>>
>>>>>>> Note that the addition was moved out of the middle and fesetround was
>>>>>>> called twice back to back, once to set the new rounding mode, and once
>>>>>>> to set it right back again.
>>>>>>>
>>>>>>> Gabe
>>>>>>>
>>>>>>> On 10/28/11 08:31, Ali Saidi wrote:
>>>>>>>> I'm still not 100% convinced that this is it. I agree it's highly
>>>>>>>> likely, but it could be some other code movement or a bug in the
>>>>>>>> optimizer (we have seen them before). I wonder if you can selectively
>>>>>>>> optimize functions. Maybe a good start is compiling everything -O3
>>>>>>>> except the atomic execute function and make sure it still works.
>>>>>>>>
>>>>>>>> Ali
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, 28 Oct 2011 07:38:59 -0700, Steve Reinhardt <[email protected]>
>>>>>>>> wrote:
>>>>>>>>> Yes, I think there exists at least one software IEEE FP
>>>>>>>>> implementation out
>>>>>>>>> there that we had talked about incorporating at some point (long ago).
>>>>>>>>> Unfortunately, as is discussed below, that's not even the issue, as we
>>>>>>>>> really want to model the not-quite-IEEE (or in the case of x87,
>>>>>>>>> not-even-close) semantics of the hardware alone, which would require
>>>>>>>>> more
>>>>>>>>> effort.
>>>>>>>>>
>>>>>>>>> If someone really cared about modeling the ISA FP support precisely 
>>>>>>>>> that
>>>>>>>>> would be an interesting project, and if it was done cleanly (probably
>>>>>>>>> with
>>>>>>>>> the option to turn it on or off) we'd be glad to incorporate it.
>>>>>>>>>
>>>>>>>>> Ironically I think the issue here is not that the HW FP is not good
>>>>>>>>> enough
>>>>>>>>> for our purposes, it's that the software stack doesn't give us clean
>>>>>>>>> enough
>>>>>>>>> access to the HW facilities (gcc in particular, though C itself may
>>>>>>>>> share
>>>>>>>>> part of the blame).
>>>>>>>>>
>>>>>>>>> Steve
>>>>>>>>>
>>>>>>>>> On Thu, Oct 27, 2011 at 11:36 PM, Gabe Black <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I think there was talk of an FP emulation library a long time ago
>>>>>>>>>> (before I was involved with M5) but we decided not to do something 
>>>>>>>>>> like
>>>>>>>>>> that for some reason. Using regular built in FP support gets us most 
>>>>>>>>>> of
>>>>>>>>>> the way with minimal hassle, but then there are situations like this
>>>>>>>>>> where it really causes trouble. I presume the prior discussion might
>>>>>>>>>> have been about whether getting most of the way there was good 
>>>>>>>>>> enough,
>>>>>>>>>> and that it's simpler.
>>>>>>>>>>
>>>>>>>>>> Gabe
>>>>>>>>>>
>>>>>>>>>> On 10/27/11 07:43, Radivoje Vasiljevic wrote:
>>>>>>>>>>> ----- Original Message ----- From: "Gabe Black"
>>>>>>>>>> <[email protected]>
>>>>>>>>>>> To: <[email protected]>
>>>>>>>>>>> Sent: 25. октобар 2011 20:53
>>>>>>>>>>> Subject: Re: [gem5-dev] Failed SPARC test
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On 10/25/11 07:46, Steve Reinhardt wrote:
>>>>>>>>>>>>> On Tue, Oct 25, 2011 at 2:30 AM, Gabe Black 
>>>>>>>>>>>>> <[email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>> [snip]
>>>>>>>>>>>> Yeah, I think ISAs treat IEEE as a really good suggestion rather
>>>>>>>>>> than a
>>>>>>>>>>>> standard. ARM isn't strictly conformant, and neither is x86. The
>>>>>>>>>> default
>>>>>>>>>>>> rounding mode *is* standard, though, and I don't think is
>>>>>>>>>> adjusted in
>>>>>>>>>>>> SPARC as a result of execution. If it changed somehow (unless I'm
>>>>>>>>>>>> forgetting where SPARC does that) it's a fairly significant 
>>>>>>>>>>>> problem.
>>>>>>>>>>>> Whether instructions generate +/- 0 in various situations may
>>>>>>>>>> depend on,
>>>>>>>>>>>> for instance, what order gcc decides to put the operands. I'm not
>>>>>>>>>> sure
>>>>>>>>>>>> that it does, but there are all kinds of weird, subtle behaviors
>>>>>>>>>> with
>>>>>>>>>>>> FP, and you can't just fix how add works if x86 picked the wrong
>>>>>>>>>> thing.
>>>>>>>>>>>> Then you have to replace add, or semi-replace it by faking it out
>>>>>>>>>> with
>>>>>>>>>>>> other FP operations. If we're running real x87 instructions (we
>>>>>>>>>>>> shouldn't be in 64 bit mode, but we still could) then those use
>>>>>>>>>> 80 bit
>>>>>>>>>>>> operands internally. Where and when rounding takes place depends
>>>>>>>>>> on when
>>>>>>>>>>>> those are moved in/out of the FPU, and will be different than
>>>>>>>>>> true 64
>>>>>>>>>>>> bit operands. SSE based FP uses real 64 bit doubles, so that should
>>>>>>>>>>>> behave better. It should also be the default in 64 bit mode since
>>>>>>>>>> the
>>>>>>>>>>>> compiler can assume some basic SSE support is present.
>>>>>>>>>>>>
>>>>>>>>>>> What about FP emulation using integers and some kind of multiple
>>>>>>>>>>> precision
>>>>>>>>>>> arithmetic? Then every detail could be modeled, including x87
>>>>>>>>>> "floats"
>>>>>>>>>>> and
>>>>>>>>>>> "doubles" (in registers exponent field is still 15 bits, not 8/11 
>>>>>>>>>>> and
>>>>>>>>>>> makes
>>>>>>>>>>> mess of overflow/underflow, or it will go in memory and will be
>>>>>>>>>> proper
>>>>>>>>>>> float/double). Gcc has some switches regarding that behavior but
>>>>>>>>>> that is
>>>>>>>>>>> very fragile (more like suggestion to compiler then enforcing
>>>>>>>>>> option).
>>>>>>>>>>> Double rounding in x87 is special story because double extended
>>>>>>>>>>> mantissa is not more than twice longer then one for double so double
>>>>>>>>>>> rounding can give different results compared to single rounding 
>>>>>>>>>>> (this
>>>>>>>>>>> situation can't happen
>>>>>>>>>>> with float vs double). One solution, for example: splitting 
>>>>>>>>>>> mantissas
>>>>>>>>>>> into to halves and performing operation, all bits would be available
>>>>>>>>>>> and then proper any kind of rounding could be enforced (real ieee or
>>>>>>>>>>> "isa style ieee"). Performing those operations is not very slow
>>>>>>>>>> and it
>>>>>>>>>>> is fairly ILP reach so slowdown is not that great as when pure 
>>>>>>>>>>> number
>>>>>>>>>>> of instructions is compared (although to have robust code, cpu and
>>>>>>>>>>> compiler independence, specially  about "optimizing code" some tests
>>>>>>>>>>> are needed to eradicate subnormals due poor support/trap emulation).
>>>>>>>>>>> Plus if instructions are mixed in right way both int and fpu units
>>>>>>>>>> can
>>>>>>>>>>> be kept busy. Exponent can be one short and problem solved. Only
>>>>>>>>>>> division can be somewhattricky (and slow), but it can be done too.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> Even if the FP rounding error isn't the source of the problem,
>>>>>>>>>> it might
>>>>>>>>>>>>> be
>>>>>>>>>>>>> easiest to fix that and get it out of the way so we can see what
>>>>>>>>>> the
>>>>>>>>>>>>> actual
>>>>>>>>>>>>> problem is.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you really want to know *why* the kernel is doing all this
>>>>>>>>>> FP, then
>>>>>>>>>>>>> yes,
>>>>>>>>>>>>> you probably need to look at the source code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Steve
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> gem5-dev mailing list
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> gem5-dev mailing list
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> gem5-dev mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>>>>> _______________________________________________
>>>>>>>>>> gem5-dev mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> gem5-dev mailing list
>>>>>>>>> [email protected]
>>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>>> _______________________________________________
>>>>>>>> gem5-dev mailing list
>>>>>>>> [email protected]
>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>>> _______________________________________________
>>>>>>> gem5-dev mailing list
>>>>>>> [email protected]
>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>> _______________________________________________
>>>>>> gem5-dev mailing list
>>>>>> [email protected]
>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>> _______________________________________________
>>>>> gem5-dev mailing list
>>>>> [email protected]
>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>> _______________________________________________
>>>> gem5-dev mailing list
>>>> [email protected]
>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>> _______________________________________________
>>> gem5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/gem5-dev
>> _______________________________________________
>> gem5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/gem5-dev
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Failed SPARC test

Reply via email to