http://permalink.gmane.org/gmane.comp.gcc.help/38146
On 10/29/11 14:21, Gabe Black wrote: > Yes, it doesn't work either. What makes the ARM asm statements work is > that they have input and output arguments. That ties them into the data > flow graph having to do with those values, and they act as anchors, > forcing values to be produced by the time you get to the asm and not to > be consumed before it. Here we're just saying not to trust memory from > before the asm, and since it's not *in* memory, the compiler merrily > ignores us. I had this problem with ARM initially too until I added the > arguments. I've tried making floating point variables volatile to ensure > they're in memory, and that doesn't work either. I think the actual > semantics of volatile are a little different than what most people > assume, although I don't remember what the distinction is. One option > might be to make the FP operation itself a virtual function. Then gcc > won't know what it does and will be less able to break things by moving > things around. > > It seems like a pretty severe deficiency of gcc that there's no way to > make fesetround work properly. It becomes nearly worthless because you > can't make any assumptions about when it will actually be in effect. > That's what we have to work with, though. > > Gabe > > On 10/29/11 13:53, Ali Saidi wrote: >> I was just about to send a message about -frounding-math when I saw yours. >> Interesting that the asm barriers appears to work with ARM. It feels like >> there should be an explicit code motion barrier. Anyway, have we tried >> compiling with the -frounding-math flag? >> >> >> >> Ali >> >> Sent from my ARM powered device >> >> On Oct 29, 2011, at 3:44 PM, Gabe Black <[email protected]> wrote: >> >>> Here's a discussion on the gcc mailing list of the thing I was talking >>> about before that's supposed to fix this, I think. >>> >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678 >>> >>> Our barriers aren't working since Frs1s, Frs2s, and Frds could all be >>> registers. >>> >>> Gabe >>> >>> On 10/29/11 13:31, Gabe Black wrote: >>>> Here is some suspect assembly from Fadds for the atomic simple CPU >>>> >>>> 0x00000000008d538e <+382>: callq 0x4cab70 <m5_fegetround> >>>> 0x00000000008d5393 <+387>: mov %eax,%r15d >>>> 0x00000000008d5396 <+390>: mov %r14d,%edi >>>> 0x00000000008d5399 <+393>: callq 0x4cab30 <m5_fesetround> >>>> 0x00000000008d539e <+398>: mov %r15d,%edi >>>> 0x00000000008d53a1 <+401>: callq 0x4cab30 <m5_fesetround> >>>> >>>> >>>> This is, more or less, from the following code. >>>> >>>> >>>> __asm__ __volatile__ ("" ::: "memory"); >>>> int oldrnd = m5_fegetround(); >>>> __asm__ __volatile__ ("" ::: "memory"); >>>> m5_fesetround(newrnd); >>>> __asm__ __volatile__ ("" ::: "memory"); >>>> Frds = Frs1s + Frs2s; >>>> __asm__ __volatile__ ("" ::: "memory"); >>>> m5_fesetround(oldrnd); >>>> __asm__ __volatile__ ("" ::: "memory"); >>>> >>>> >>>> Note that the addition was moved out of the middle and fesetround was >>>> called twice back to back, once to set the new rounding mode, and once >>>> to set it right back again. >>>> >>>> Gabe >>>> >>>> On 10/28/11 08:31, Ali Saidi wrote: >>>>> I'm still not 100% convinced that this is it. I agree it's highly >>>>> likely, but it could be some other code movement or a bug in the >>>>> optimizer (we have seen them before). I wonder if you can selectively >>>>> optimize functions. Maybe a good start is compiling everything -O3 >>>>> except the atomic execute function and make sure it still works. >>>>> >>>>> Ali >>>>> >>>>> >>>>> >>>>> On Fri, 28 Oct 2011 07:38:59 -0700, Steve Reinhardt <[email protected]> >>>>> wrote: >>>>>> Yes, I think there exists at least one software IEEE FP >>>>>> implementation out >>>>>> there that we had talked about incorporating at some point (long ago). >>>>>> Unfortunately, as is discussed below, that's not even the issue, as we >>>>>> really want to model the not-quite-IEEE (or in the case of x87, >>>>>> not-even-close) semantics of the hardware alone, which would require >>>>>> more >>>>>> effort. >>>>>> >>>>>> If someone really cared about modeling the ISA FP support precisely that >>>>>> would be an interesting project, and if it was done cleanly (probably >>>>>> with >>>>>> the option to turn it on or off) we'd be glad to incorporate it. >>>>>> >>>>>> Ironically I think the issue here is not that the HW FP is not good >>>>>> enough >>>>>> for our purposes, it's that the software stack doesn't give us clean >>>>>> enough >>>>>> access to the HW facilities (gcc in particular, though C itself may >>>>>> share >>>>>> part of the blame). >>>>>> >>>>>> Steve >>>>>> >>>>>> On Thu, Oct 27, 2011 at 11:36 PM, Gabe Black <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I think there was talk of an FP emulation library a long time ago >>>>>>> (before I was involved with M5) but we decided not to do something like >>>>>>> that for some reason. Using regular built in FP support gets us most of >>>>>>> the way with minimal hassle, but then there are situations like this >>>>>>> where it really causes trouble. I presume the prior discussion might >>>>>>> have been about whether getting most of the way there was good enough, >>>>>>> and that it's simpler. >>>>>>> >>>>>>> Gabe >>>>>>> >>>>>>> On 10/27/11 07:43, Radivoje Vasiljevic wrote: >>>>>>>> ----- Original Message ----- From: "Gabe Black" >>>>>>> <[email protected]> >>>>>>>> To: <[email protected]> >>>>>>>> Sent: 25. октобар 2011 20:53 >>>>>>>> Subject: Re: [gem5-dev] Failed SPARC test >>>>>>>> >>>>>>>> >>>>>>>>> On 10/25/11 07:46, Steve Reinhardt wrote: >>>>>>>>>> On Tue, Oct 25, 2011 at 2:30 AM, Gabe Black <[email protected]> >>>>>>>>>> wrote: >>>>>>>> [snip] >>>>>>>>> Yeah, I think ISAs treat IEEE as a really good suggestion rather >>>>>>> than a >>>>>>>>> standard. ARM isn't strictly conformant, and neither is x86. The >>>>>>> default >>>>>>>>> rounding mode *is* standard, though, and I don't think is >>>>>>> adjusted in >>>>>>>>> SPARC as a result of execution. If it changed somehow (unless I'm >>>>>>>>> forgetting where SPARC does that) it's a fairly significant problem. >>>>>>>>> Whether instructions generate +/- 0 in various situations may >>>>>>> depend on, >>>>>>>>> for instance, what order gcc decides to put the operands. I'm not >>>>>>> sure >>>>>>>>> that it does, but there are all kinds of weird, subtle behaviors >>>>>>> with >>>>>>>>> FP, and you can't just fix how add works if x86 picked the wrong >>>>>>> thing. >>>>>>>>> Then you have to replace add, or semi-replace it by faking it out >>>>>>> with >>>>>>>>> other FP operations. If we're running real x87 instructions (we >>>>>>>>> shouldn't be in 64 bit mode, but we still could) then those use >>>>>>> 80 bit >>>>>>>>> operands internally. Where and when rounding takes place depends >>>>>>> on when >>>>>>>>> those are moved in/out of the FPU, and will be different than >>>>>>> true 64 >>>>>>>>> bit operands. SSE based FP uses real 64 bit doubles, so that should >>>>>>>>> behave better. It should also be the default in 64 bit mode since >>>>>>> the >>>>>>>>> compiler can assume some basic SSE support is present. >>>>>>>>> >>>>>>>> What about FP emulation using integers and some kind of multiple >>>>>>>> precision >>>>>>>> arithmetic? Then every detail could be modeled, including x87 >>>>>>> "floats" >>>>>>>> and >>>>>>>> "doubles" (in registers exponent field is still 15 bits, not 8/11 and >>>>>>>> makes >>>>>>>> mess of overflow/underflow, or it will go in memory and will be >>>>>>> proper >>>>>>>> float/double). Gcc has some switches regarding that behavior but >>>>>>> that is >>>>>>>> very fragile (more like suggestion to compiler then enforcing >>>>>>> option). >>>>>>>> Double rounding in x87 is special story because double extended >>>>>>>> mantissa is not more than twice longer then one for double so double >>>>>>>> rounding can give different results compared to single rounding (this >>>>>>>> situation can't happen >>>>>>>> with float vs double). One solution, for example: splitting mantissas >>>>>>>> into to halves and performing operation, all bits would be available >>>>>>>> and then proper any kind of rounding could be enforced (real ieee or >>>>>>>> "isa style ieee"). Performing those operations is not very slow >>>>>>> and it >>>>>>>> is fairly ILP reach so slowdown is not that great as when pure number >>>>>>>> of instructions is compared (although to have robust code, cpu and >>>>>>>> compiler independence, specially about "optimizing code" some tests >>>>>>>> are needed to eradicate subnormals due poor support/trap emulation). >>>>>>>> Plus if instructions are mixed in right way both int and fpu units >>>>>>> can >>>>>>>> be kept busy. Exponent can be one short and problem solved. Only >>>>>>>> division can be somewhattricky (and slow), but it can be done too. >>>>>>>> >>>>>>>> >>>>>>>>>> Even if the FP rounding error isn't the source of the problem, >>>>>>> it might >>>>>>>>>> be >>>>>>>>>> easiest to fix that and get it out of the way so we can see what >>>>>>> the >>>>>>>>>> actual >>>>>>>>>> problem is. >>>>>>>>>> >>>>>>>>>> If you really want to know *why* the kernel is doing all this >>>>>>> FP, then >>>>>>>>>> yes, >>>>>>>>>> you probably need to look at the source code. >>>>>>>>>> >>>>>>>>>> Steve >>>>>>>>>> _______________________________________________ >>>>>>>>>> gem5-dev mailing list >>>>>>>>>> [email protected] >>>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>>>>>>> _______________________________________________ >>>>>>>>> gem5-dev mailing list >>>>>>>>> [email protected] >>>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> gem5-dev mailing list >>>>>>>> [email protected] >>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>>>>> _______________________________________________ >>>>>>> gem5-dev mailing list >>>>>>> [email protected] >>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>>>>> >>>>>> _______________________________________________ >>>>>> gem5-dev mailing list >>>>>> [email protected] >>>>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>>> _______________________________________________ >>>>> gem5-dev mailing list >>>>> [email protected] >>>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>> _______________________________________________ >>>> gem5-dev mailing list >>>> [email protected] >>>> http://m5sim.org/mailman/listinfo/gem5-dev >>> _______________________________________________ >>> gem5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/gem5-dev >> _______________________________________________ >> gem5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/gem5-dev > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
