Here's a discussion on the gcc mailing list of the thing I was talking about before that's supposed to fix this, I think.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678 Our barriers aren't working since Frs1s, Frs2s, and Frds could all be registers. Gabe On 10/29/11 13:31, Gabe Black wrote: > Here is some suspect assembly from Fadds for the atomic simple CPU > > 0x00000000008d538e <+382>: callq 0x4cab70 <m5_fegetround> > 0x00000000008d5393 <+387>: mov %eax,%r15d > 0x00000000008d5396 <+390>: mov %r14d,%edi > 0x00000000008d5399 <+393>: callq 0x4cab30 <m5_fesetround> > 0x00000000008d539e <+398>: mov %r15d,%edi > 0x00000000008d53a1 <+401>: callq 0x4cab30 <m5_fesetround> > > > This is, more or less, from the following code. > > > __asm__ __volatile__ ("" ::: "memory"); > int oldrnd = m5_fegetround(); > __asm__ __volatile__ ("" ::: "memory"); > m5_fesetround(newrnd); > __asm__ __volatile__ ("" ::: "memory"); > Frds = Frs1s + Frs2s; > __asm__ __volatile__ ("" ::: "memory"); > m5_fesetround(oldrnd); > __asm__ __volatile__ ("" ::: "memory"); > > > Note that the addition was moved out of the middle and fesetround was > called twice back to back, once to set the new rounding mode, and once > to set it right back again. > > Gabe > > On 10/28/11 08:31, Ali Saidi wrote: >> I'm still not 100% convinced that this is it. I agree it's highly >> likely, but it could be some other code movement or a bug in the >> optimizer (we have seen them before). I wonder if you can selectively >> optimize functions. Maybe a good start is compiling everything -O3 >> except the atomic execute function and make sure it still works. >> >> Ali >> >> >> >> On Fri, 28 Oct 2011 07:38:59 -0700, Steve Reinhardt <[email protected]> >> wrote: >>> Yes, I think there exists at least one software IEEE FP >>> implementation out >>> there that we had talked about incorporating at some point (long ago). >>> Unfortunately, as is discussed below, that's not even the issue, as we >>> really want to model the not-quite-IEEE (or in the case of x87, >>> not-even-close) semantics of the hardware alone, which would require >>> more >>> effort. >>> >>> If someone really cared about modeling the ISA FP support precisely that >>> would be an interesting project, and if it was done cleanly (probably >>> with >>> the option to turn it on or off) we'd be glad to incorporate it. >>> >>> Ironically I think the issue here is not that the HW FP is not good >>> enough >>> for our purposes, it's that the software stack doesn't give us clean >>> enough >>> access to the HW facilities (gcc in particular, though C itself may >>> share >>> part of the blame). >>> >>> Steve >>> >>> On Thu, Oct 27, 2011 at 11:36 PM, Gabe Black <[email protected]> >>> wrote: >>> >>>> I think there was talk of an FP emulation library a long time ago >>>> (before I was involved with M5) but we decided not to do something like >>>> that for some reason. Using regular built in FP support gets us most of >>>> the way with minimal hassle, but then there are situations like this >>>> where it really causes trouble. I presume the prior discussion might >>>> have been about whether getting most of the way there was good enough, >>>> and that it's simpler. >>>> >>>> Gabe >>>> >>>> On 10/27/11 07:43, Radivoje Vasiljevic wrote: >>>>> ----- Original Message ----- From: "Gabe Black" >>>> <[email protected]> >>>>> To: <[email protected]> >>>>> Sent: 25. октобар 2011 20:53 >>>>> Subject: Re: [gem5-dev] Failed SPARC test >>>>> >>>>> >>>>>> On 10/25/11 07:46, Steve Reinhardt wrote: >>>>>>> On Tue, Oct 25, 2011 at 2:30 AM, Gabe Black <[email protected]> >>>>>>> wrote: >>>>> [snip] >>>>>> Yeah, I think ISAs treat IEEE as a really good suggestion rather >>>> than a >>>>>> standard. ARM isn't strictly conformant, and neither is x86. The >>>> default >>>>>> rounding mode *is* standard, though, and I don't think is >>>> adjusted in >>>>>> SPARC as a result of execution. If it changed somehow (unless I'm >>>>>> forgetting where SPARC does that) it's a fairly significant problem. >>>>>> Whether instructions generate +/- 0 in various situations may >>>> depend on, >>>>>> for instance, what order gcc decides to put the operands. I'm not >>>> sure >>>>>> that it does, but there are all kinds of weird, subtle behaviors >>>> with >>>>>> FP, and you can't just fix how add works if x86 picked the wrong >>>> thing. >>>>>> Then you have to replace add, or semi-replace it by faking it out >>>> with >>>>>> other FP operations. If we're running real x87 instructions (we >>>>>> shouldn't be in 64 bit mode, but we still could) then those use >>>> 80 bit >>>>>> operands internally. Where and when rounding takes place depends >>>> on when >>>>>> those are moved in/out of the FPU, and will be different than >>>> true 64 >>>>>> bit operands. SSE based FP uses real 64 bit doubles, so that should >>>>>> behave better. It should also be the default in 64 bit mode since >>>> the >>>>>> compiler can assume some basic SSE support is present. >>>>>> >>>>> What about FP emulation using integers and some kind of multiple >>>>> precision >>>>> arithmetic? Then every detail could be modeled, including x87 >>>> "floats" >>>>> and >>>>> "doubles" (in registers exponent field is still 15 bits, not 8/11 and >>>>> makes >>>>> mess of overflow/underflow, or it will go in memory and will be >>>> proper >>>>> float/double). Gcc has some switches regarding that behavior but >>>> that is >>>>> very fragile (more like suggestion to compiler then enforcing >>>> option). >>>>> Double rounding in x87 is special story because double extended >>>>> mantissa is not more than twice longer then one for double so double >>>>> rounding can give different results compared to single rounding (this >>>>> situation can't happen >>>>> with float vs double). One solution, for example: splitting mantissas >>>>> into to halves and performing operation, all bits would be available >>>>> and then proper any kind of rounding could be enforced (real ieee or >>>>> "isa style ieee"). Performing those operations is not very slow >>>> and it >>>>> is fairly ILP reach so slowdown is not that great as when pure number >>>>> of instructions is compared (although to have robust code, cpu and >>>>> compiler independence, specially about "optimizing code" some tests >>>>> are needed to eradicate subnormals due poor support/trap emulation). >>>>> Plus if instructions are mixed in right way both int and fpu units >>>> can >>>>> be kept busy. Exponent can be one short and problem solved. Only >>>>> division can be somewhattricky (and slow), but it can be done too. >>>>> >>>>> >>>>>>> Even if the FP rounding error isn't the source of the problem, >>>> it might >>>>>>> be >>>>>>> easiest to fix that and get it out of the way so we can see what >>>> the >>>>>>> actual >>>>>>> problem is. >>>>>>> >>>>>>> If you really want to know *why* the kernel is doing all this >>>> FP, then >>>>>>> yes, >>>>>>> you probably need to look at the source code. >>>>>>> >>>>>>> Steve >>>>>>> _______________________________________________ >>>>>>> gem5-dev mailing list >>>>>>> [email protected] >>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>>>> _______________________________________________ >>>>>> gem5-dev mailing list >>>>>> [email protected] >>>>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>>>> >>>>> >>>>> _______________________________________________ >>>>> gem5-dev mailing list >>>>> [email protected] >>>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>> _______________________________________________ >>>> gem5-dev mailing list >>>> [email protected] >>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>> >>> _______________________________________________ >>> gem5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/gem5-dev >> _______________________________________________ >> gem5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/gem5-dev > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
