I was just about to send a message about -frounding-math when I saw yours. Interesting that the asm barriers appears to work with ARM. It feels like there should be an explicit code motion barrier. Anyway, have we tried compiling with the -frounding-math flag?
Ali Sent from my ARM powered device On Oct 29, 2011, at 3:44 PM, Gabe Black <[email protected]> wrote: > Here's a discussion on the gcc mailing list of the thing I was talking > about before that's supposed to fix this, I think. > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678 > > Our barriers aren't working since Frs1s, Frs2s, and Frds could all be > registers. > > Gabe > > On 10/29/11 13:31, Gabe Black wrote: >> Here is some suspect assembly from Fadds for the atomic simple CPU >> >> 0x00000000008d538e <+382>: callq 0x4cab70 <m5_fegetround> >> 0x00000000008d5393 <+387>: mov %eax,%r15d >> 0x00000000008d5396 <+390>: mov %r14d,%edi >> 0x00000000008d5399 <+393>: callq 0x4cab30 <m5_fesetround> >> 0x00000000008d539e <+398>: mov %r15d,%edi >> 0x00000000008d53a1 <+401>: callq 0x4cab30 <m5_fesetround> >> >> >> This is, more or less, from the following code. >> >> >> __asm__ __volatile__ ("" ::: "memory"); >> int oldrnd = m5_fegetround(); >> __asm__ __volatile__ ("" ::: "memory"); >> m5_fesetround(newrnd); >> __asm__ __volatile__ ("" ::: "memory"); >> Frds = Frs1s + Frs2s; >> __asm__ __volatile__ ("" ::: "memory"); >> m5_fesetround(oldrnd); >> __asm__ __volatile__ ("" ::: "memory"); >> >> >> Note that the addition was moved out of the middle and fesetround was >> called twice back to back, once to set the new rounding mode, and once >> to set it right back again. >> >> Gabe >> >> On 10/28/11 08:31, Ali Saidi wrote: >>> I'm still not 100% convinced that this is it. I agree it's highly >>> likely, but it could be some other code movement or a bug in the >>> optimizer (we have seen them before). I wonder if you can selectively >>> optimize functions. Maybe a good start is compiling everything -O3 >>> except the atomic execute function and make sure it still works. >>> >>> Ali >>> >>> >>> >>> On Fri, 28 Oct 2011 07:38:59 -0700, Steve Reinhardt <[email protected]> >>> wrote: >>>> Yes, I think there exists at least one software IEEE FP >>>> implementation out >>>> there that we had talked about incorporating at some point (long ago). >>>> Unfortunately, as is discussed below, that's not even the issue, as we >>>> really want to model the not-quite-IEEE (or in the case of x87, >>>> not-even-close) semantics of the hardware alone, which would require >>>> more >>>> effort. >>>> >>>> If someone really cared about modeling the ISA FP support precisely that >>>> would be an interesting project, and if it was done cleanly (probably >>>> with >>>> the option to turn it on or off) we'd be glad to incorporate it. >>>> >>>> Ironically I think the issue here is not that the HW FP is not good >>>> enough >>>> for our purposes, it's that the software stack doesn't give us clean >>>> enough >>>> access to the HW facilities (gcc in particular, though C itself may >>>> share >>>> part of the blame). >>>> >>>> Steve >>>> >>>> On Thu, Oct 27, 2011 at 11:36 PM, Gabe Black <[email protected]> >>>> wrote: >>>> >>>>> I think there was talk of an FP emulation library a long time ago >>>>> (before I was involved with M5) but we decided not to do something like >>>>> that for some reason. Using regular built in FP support gets us most of >>>>> the way with minimal hassle, but then there are situations like this >>>>> where it really causes trouble. I presume the prior discussion might >>>>> have been about whether getting most of the way there was good enough, >>>>> and that it's simpler. >>>>> >>>>> Gabe >>>>> >>>>> On 10/27/11 07:43, Radivoje Vasiljevic wrote: >>>>>> ----- Original Message ----- From: "Gabe Black" >>>>> <[email protected]> >>>>>> To: <[email protected]> >>>>>> Sent: 25. октобар 2011 20:53 >>>>>> Subject: Re: [gem5-dev] Failed SPARC test >>>>>> >>>>>> >>>>>>> On 10/25/11 07:46, Steve Reinhardt wrote: >>>>>>>> On Tue, Oct 25, 2011 at 2:30 AM, Gabe Black <[email protected]> >>>>>>>> wrote: >>>>>> [snip] >>>>>>> Yeah, I think ISAs treat IEEE as a really good suggestion rather >>>>> than a >>>>>>> standard. ARM isn't strictly conformant, and neither is x86. The >>>>> default >>>>>>> rounding mode *is* standard, though, and I don't think is >>>>> adjusted in >>>>>>> SPARC as a result of execution. If it changed somehow (unless I'm >>>>>>> forgetting where SPARC does that) it's a fairly significant problem. >>>>>>> Whether instructions generate +/- 0 in various situations may >>>>> depend on, >>>>>>> for instance, what order gcc decides to put the operands. I'm not >>>>> sure >>>>>>> that it does, but there are all kinds of weird, subtle behaviors >>>>> with >>>>>>> FP, and you can't just fix how add works if x86 picked the wrong >>>>> thing. >>>>>>> Then you have to replace add, or semi-replace it by faking it out >>>>> with >>>>>>> other FP operations. If we're running real x87 instructions (we >>>>>>> shouldn't be in 64 bit mode, but we still could) then those use >>>>> 80 bit >>>>>>> operands internally. Where and when rounding takes place depends >>>>> on when >>>>>>> those are moved in/out of the FPU, and will be different than >>>>> true 64 >>>>>>> bit operands. SSE based FP uses real 64 bit doubles, so that should >>>>>>> behave better. It should also be the default in 64 bit mode since >>>>> the >>>>>>> compiler can assume some basic SSE support is present. >>>>>>> >>>>>> What about FP emulation using integers and some kind of multiple >>>>>> precision >>>>>> arithmetic? Then every detail could be modeled, including x87 >>>>> "floats" >>>>>> and >>>>>> "doubles" (in registers exponent field is still 15 bits, not 8/11 and >>>>>> makes >>>>>> mess of overflow/underflow, or it will go in memory and will be >>>>> proper >>>>>> float/double). Gcc has some switches regarding that behavior but >>>>> that is >>>>>> very fragile (more like suggestion to compiler then enforcing >>>>> option). >>>>>> Double rounding in x87 is special story because double extended >>>>>> mantissa is not more than twice longer then one for double so double >>>>>> rounding can give different results compared to single rounding (this >>>>>> situation can't happen >>>>>> with float vs double). One solution, for example: splitting mantissas >>>>>> into to halves and performing operation, all bits would be available >>>>>> and then proper any kind of rounding could be enforced (real ieee or >>>>>> "isa style ieee"). Performing those operations is not very slow >>>>> and it >>>>>> is fairly ILP reach so slowdown is not that great as when pure number >>>>>> of instructions is compared (although to have robust code, cpu and >>>>>> compiler independence, specially about "optimizing code" some tests >>>>>> are needed to eradicate subnormals due poor support/trap emulation). >>>>>> Plus if instructions are mixed in right way both int and fpu units >>>>> can >>>>>> be kept busy. Exponent can be one short and problem solved. Only >>>>>> division can be somewhattricky (and slow), but it can be done too. >>>>>> >>>>>> >>>>>>>> Even if the FP rounding error isn't the source of the problem, >>>>> it might >>>>>>>> be >>>>>>>> easiest to fix that and get it out of the way so we can see what >>>>> the >>>>>>>> actual >>>>>>>> problem is. >>>>>>>> >>>>>>>> If you really want to know *why* the kernel is doing all this >>>>> FP, then >>>>>>>> yes, >>>>>>>> you probably need to look at the source code. >>>>>>>> >>>>>>>> Steve >>>>>>>> _______________________________________________ >>>>>>>> gem5-dev mailing list >>>>>>>> [email protected] >>>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>>>>> _______________________________________________ >>>>>>> gem5-dev mailing list >>>>>>> [email protected] >>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> gem5-dev mailing list >>>>>> [email protected] >>>>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>>> _______________________________________________ >>>>> gem5-dev mailing list >>>>> [email protected] >>>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>>> >>>> _______________________________________________ >>>> gem5-dev mailing list >>>> [email protected] >>>> http://m5sim.org/mailman/listinfo/gem5-dev >>> _______________________________________________ >>> gem5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/gem5-dev >> _______________________________________________ >> gem5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/gem5-dev > > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
