Re: [gem5-dev] Failed SPARC test

Gabriel Michael Black Sat, 29 Oct 2011 19:18:51 -0700

Even this isn't fool proof, though. Theoretically here's nothing toprevent gcc from doing the same calculation twice, once inside the asmblobs and once outside. It may assume the values are the same and usethe wrong one as the result. That -frounding-math would prevent that,I think, since it prevents gcc from assuming the rounding mode isalways the same. It would be weird for gcc to purposefully do the samecalculation twice, but it would be within its rights to do so.


Gabe


Quoting Gabriel Michael Black <[email protected]>:

That sounds a bit like what I did with the asm blocks in ARM, exceptthat it would modify that function and actually do that call. We'dalso need to have single and double versions even though theparameter isn't used. This is the sort of thing I'm talking aboutfrom ARM.


m5_fesetround(newrnd)
__asm__ __volatile__ ("" : "=m" (Frs1s) : "m" (Frs1s));
__asm__ __volatile__ ("" : "=m" (Frs2s) : "m" (Frs2s));
Frds = Frs1s + Frs2s;
__asm__ __volatile__ ("" : "=m" (Frds) : "m" (Frds));
m5_fesetround(oldrnd)

The gcc is obligated to use the values of Frs1s and Frs2s "returned"by the first two asm blocks, and it's obligated to pass the resultas a parameter to the third asm block. Those constraints pinch it inthe middle and force the operation to fall inside the m5_fesetroundcalls.

This works in ARM, but here it's a little more cumbersome sinceSPARC has been doing the rounding stuff in a generic way withoutknowledge (more or less) of what the operands are. The filterDoublesthing is already set up to look for operands, so it's not totally anew idea.

One thing I just thought of is that I'm not completely sure that gccwill leave those variables in the same place when they're used forinputs and outputs. Maybe it expects the asm to move the input tothe output even if it doesn't do anything else? Note that while they*look* like the same variable, there's nothing (that I know of) thatrequires gcc to make that name refer to the same storage all thetime, just the same value. There's syntax to tell it to useparticular output as an input too (or the other way around?), andthat may make this sort of thing less of an issue. I have no goodreason to think there's actually a problem here, but hypotheticallyit could be yet another problem with playing these sorts of games.


Gabe

Quoting Steve Reinhardt <[email protected]>:

Bleah, this is ugly!  Reading that one bug report Gabe linked to, it sounds
like -frounding-math is supposed to make this work, but it's not correctly
implemented, and as a result there's really no straightforward way to make
this work.  I think that should be documented somewhere so that one day, if
-frounding-math does get implemented properly, we can start relying on it
and not on whatever hack we come up with.

Another idea, assuming m5_fesetround() isn't inlined, would be to have it
accept a double argument that it just passes back unmodified.  Then you
could do something like:

Frs1s = m5_fesetround(newrnd, Frs1s);
Frds = Frs1s + Frs2s;
Frds = m5_fesetround(oldrnd, Frds);

Would that work?

Steve

On Sat, Oct 29, 2011 at 4:51 PM, Gabe Black <[email protected]> wrote:

I don't think either will work because it's not the optimizations in
those functions or the functions order relative to each other or the
asms, it's the position of the add relative to the asms. Since the add
can move around wherever, it doesn't matter if the calls to fesetround
are bounded by the asms. We could potentially mark the execute function
with a different optimization level though. That might work. Also, I
have that filterDoubles function in there that finds fp operands that
are doubles and builds them from or breaks them down into single floats.
We could possibly piggyback on that to add in asms with the right
properties like in ARM. It's a bit gross, but like you said I don't know
if we can avoid that.

Gabe

On 10/29/11 16:31, Ali Saidi wrote:

If we go down the path below, slighly less hacky might be just making

gcc compiler the entire fenv file without optimization, although perhaps
that is insufficient....


Ali

On Oct 29, 2011, at 6:30 PM, Ali Saidi wrote:

What about making m5_fesetround and m5_fegetround() modify memory and

thus prevent reordering?


Something like:

volatile int dummy_compiler;

void m5_fesetround(int rm)
{
   assert(rm >= 0 && rm < 4);
   dummy_compiler++;
   fesetround(m5_round_ops[rm]);
   dummy_compiler++;
}

int m5_fegetround()
{
   int x;
   dummy_compiler++;
   int rm = fegetround();
   dummy_compiler++;
   for(x = 0; x < 4; x++)
       if (m5_round_ops[x] == rm)
           return x;
   abort();
   return 0;
}

Would that just fix it? Mabye m5_round_ops and rm could be made

volatile instead?


Another possible solution and hack, but I think we're into hack

territory no matter what since gcc seems brain damaged in this regard:


#if __GNUC__ > 3 && __GNUC_MINOR__  > 3 // 4.4 or newer
#pragma GCC push_options
#pragma GCC optimize ("O0")

// m5_fe* goes here

#pragma GCC pop_options
#endif


A third option would be something like

void __attribute__((optimize("O0")) m5_fesetround(int rm)...

Ali


On Oct 29, 2011, at 4:59 PM, Gabe Black wrote:

http://permalink.gmane.org/gmane.comp.gcc.help/38146

On 10/29/11 14:21, Gabe Black wrote:

Yes, it doesn't work either. What makes the ARM asm statements work is
that they have input and output arguments. That ties them into the

data

flow graph having to do with those values, and they act as anchors,
forcing values to be produced by the time you get to the asm and not

to

be consumed before it. Here we're just saying not to trust memory from
before the asm, and since it's not *in* memory, the compiler merrily
ignores us. I had this problem with ARM initially too until I added

the

arguments. I've tried making floating point variables volatile to

ensure

they're in memory, and that doesn't work either. I think the actual
semantics of volatile are a little different than what most people
assume, although I don't remember what the distinction is. One option
might be to make the FP operation itself a virtual function. Then gcc
won't know what it does and will be less able to break things by

moving

things around.

It seems like a pretty severe deficiency of gcc that there's no way to
make fesetround work properly. It becomes nearly worthless because you
can't make any assumptions about when it will actually be in effect.
That's what we have to work with, though.

Gabe

On 10/29/11 13:53, Ali Saidi wrote:

I was just about to send a message about -frounding-math when I saw

yours. Interesting that the asm barriers appears to work with ARM. It feels
like there should be an explicit code motion barrier. Anyway, have we tried
compiling with the -frounding-math flag?




Ali

Sent from my ARM powered device

On Oct 29, 2011, at 3:44 PM, Gabe Black <[email protected]>

wrote:

Here's a discussion on the gcc mailing list of the thing I was

talking

about before that's supposed to fix this, I think.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678

Our barriers aren't working since Frs1s, Frs2s, and Frds could all

be

registers.

Gabe

On 10/29/11 13:31, Gabe Black wrote:

Here is some suspect assembly from Fadds for the atomic simple CPU

0x00000000008d538e <+382>:   callq  0x4cab70 <m5_fegetround>
0x00000000008d5393 <+387>:   mov    %eax,%r15d
0x00000000008d5396 <+390>:   mov    %r14d,%edi
0x00000000008d5399 <+393>:   callq  0x4cab30 <m5_fesetround>
0x00000000008d539e <+398>:   mov    %r15d,%edi
0x00000000008d53a1 <+401>:   callq  0x4cab30 <m5_fesetround>


This is, more or less, from the following code.


 __asm__ __volatile__ ("" ::: "memory");
 int oldrnd = m5_fegetround();
 __asm__ __volatile__ ("" ::: "memory");
 m5_fesetround(newrnd);
 __asm__ __volatile__ ("" ::: "memory");
Frds = Frs1s + Frs2s;
 __asm__ __volatile__ ("" ::: "memory");
m5_fesetround(oldrnd);
 __asm__ __volatile__ ("" ::: "memory");


Note that the addition was moved out of the middle and fesetround

was

called twice back to back, once to set the new rounding mode, and

once

to set it right back again.

Gabe

On 10/28/11 08:31, Ali Saidi wrote:

I'm still not 100% convinced that this is it. I agree it's highly
likely, but it could be some other code movement or a bug in the
optimizer (we have seen them before). I wonder if you can

selectively

optimize functions. Maybe a good start is compiling everything -O3
except the atomic execute function and make sure it still works.

Ali



On Fri, 28 Oct 2011 07:38:59 -0700, Steve Reinhardt <

[email protected]>

wrote:

Yes, I think there exists at least one software IEEE FP
implementation out
there that we had talked about incorporating at some point (long

ago).

Unfortunately, as is discussed below, that's not even the issue,

as we

really want to model the not-quite-IEEE (or in the case of x87,
not-even-close) semantics of the hardware alone, which would

require

more
effort.

If someone really cared about modeling the ISA FP support

precisely that

would be an interesting project, and if it was done cleanly

(probably

with
the option to turn it on or off) we'd be glad to incorporate it.

Ironically I think the issue here is not that the HW FP is not

good

enough
for our purposes, it's that the software stack doesn't give us

clean

enough
access to the HW facilities (gcc in particular, though C itself

may

share
part of the blame).

Steve

On Thu, Oct 27, 2011 at 11:36 PM, Gabe Black <

[email protected]>

wrote:

I think there was talk of an FP emulation library a long time

ago

(before I was involved with M5) but we decided not to do

something like

that for some reason. Using regular built in FP support gets us

most of

the way with minimal hassle, but then there are situations like

this

where it really causes trouble. I presume the prior discussion

might

have been about whether getting most of the way there was good

enough,

and that it's simpler.

Gabe

On 10/27/11 07:43, Radivoje Vasiljevic wrote:

----- Original Message ----- From: "Gabe Black"

<[email protected]>

To: <[email protected]>
Sent: 25. октобар 2011 20:53
Subject: Re: [gem5-dev] Failed SPARC test

On 10/25/11 07:46, Steve Reinhardt wrote:

On Tue, Oct 25, 2011 at 2:30 AM, Gabe Black <

[email protected]>

wrote:

[snip]

Yeah, I think ISAs treat IEEE as a really good suggestion

rather

than a

standard. ARM isn't strictly conformant, and neither is x86.

The

default

rounding mode *is* standard, though, and I don't think is

adjusted in

SPARC as a result of execution. If it changed somehow (unless

I'm

forgetting where SPARC does that) it's a fairly significant

problem.

Whether instructions generate +/- 0 in various situations may

depend on,

for instance, what order gcc decides to put the operands. I'm

not

sure

that it does, but there are all kinds of weird, subtle

behaviors

with

FP, and you can't just fix how add works if x86 picked the

wrong

thing.

Then you have to replace add, or semi-replace it by faking it

out

with

other FP operations. If we're running real x87 instructions

(we

shouldn't be in 64 bit mode, but we still could) then those

use

80 bit

operands internally. Where and when rounding takes place

depends

on when

those are moved in/out of the FPU, and will be different than

true 64

bit operands. SSE based FP uses real 64 bit doubles, so that

should

behave better. It should also be the default in 64 bit mode

since

the

compiler can assume some basic SSE support is present.

What about FP emulation using integers and some kind of

multiple

precision
arithmetic? Then every detail could be modeled, including x87

"floats"

and
"doubles" (in registers exponent field is still 15 bits, not

8/11 and

makes
mess of overflow/underflow, or it will go in memory and will be

proper

float/double). Gcc has some switches regarding that behavior

but

that is

very fragile (more like suggestion to compiler then enforcing

option).

Double rounding in x87 is special story because double extended
mantissa is not more than twice longer then one for double so

double

rounding can give different results compared to single

rounding (this

situation can't happen
with float vs double). One solution, for example: splitting

mantissas

into to halves and performing operation, all bits would be

available

and then proper any kind of rounding could be enforced (real

ieee or

"isa style ieee"). Performing those operations is not very slow

and it

is fairly ILP reach so slowdown is not that great as when pure

number

of instructions is compared (although to have robust code, cpu

and

compiler independence, specially  about "optimizing code" some

tests

are needed to eradicate subnormals due poor support/trap

emulation).

Plus if instructions are mixed in right way both int and fpu

units

can

be kept busy. Exponent can be one short and problem solved.

Only

division can be somewhattricky (and slow), but it can be done

too.

Even if the FP rounding error isn't the source of the

problem,

it might

be
easiest to fix that and get it out of the way so we can see

what

the

actual
problem is.

If you really want to know *why* the kernel is doing all this

FP, then

yes,
you probably need to look at the source code.

Steve
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev


_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev



_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev



_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Failed SPARC test

Reply via email to