I believe what you say, it's what you hadn't said that I was wondering about ;-).
Steve On Thu, Oct 27, 2011 at 11:23 PM, Gabe Black <[email protected]> wrote: > I forgot to mention that while working on ARM, I did actually look at > the assembly that was generated and gcc was moving things around in less > than helpful ways. You're welcome to look at the assembly if you don't > believe me :-). SPARC is pretty straightforward ISA description wise, so > it should be too difficult to find the responsible code. > > Gabe > > On 10/27/11 11:30, Gabe Black wrote: > > Exactly this was happening on ARM, and that's why there are all the > > weird __asm__ statements in its instructions. There I had to also > > specify variables as inputs and outputs from the __asm__ statements so > > that other instructions would have to produce their value before or > > consume their value after that point. Here I can't do that since (quite > > reasonably) the code that sets the rounding mode is factored out into a > > common blob, and I don't know what the necessary variables are. There's > > supposed to be some way to prevent this sort of problem, but for gcc > > it's not implemented. I forget exactly how that's supposed to work. > > > > Gabe > > > > On 10/27/11 08:32, Steve Reinhardt wrote: > >> Are you positive this is it? It does sound very likely that this is the > >> issue, but is there indisputable evidence, like you looked at the > >> disassembly and you can see that things are scheduled in the wrong > order? > >> I'm asking because even though I agree that this seems likely to be the > >> issue, it seems equally unlikely that gcc would reorder operations > around > >> function calls like m5_fesetround() (unless they're inlined), and the > fact > >> that the asm statements didn't help seems like further evidence that > maybe > >> we're not focusing on exactly the right place. > >> > >> Steve > >> > >> On Thu, Oct 27, 2011 at 12:35 AM, Gabe Black <[email protected]> > wrote: > >> > >>> I'm convinced we've successfully identified the problem, but > >>> unfortunately I added barriers liberally and it still failed. > >>> > >>> Gabe > >>> > >>> int newrnd = M5_FE_TONEAREST; > >>> switch (Fsr<31:30>) { > >>> case 0: newrnd = M5_FE_TONEAREST; break; > >>> case 1: newrnd = M5_FE_TOWARDZERO; break; > >>> case 2: newrnd = M5_FE_UPWARD; break; > >>> case 3: newrnd = M5_FE_DOWNWARD; break; > >>> } > >>> __asm__ __volatile__ ("" ::: "memory"); > >>> int oldrnd = m5_fegetround(); > >>> __asm__ __volatile__ ("" ::: "memory"); > >>> m5_fesetround(newrnd); > >>> __asm__ __volatile__ ("" ::: "memory"); > >>> """ > >>> > >>> fp_code += code > >>> > >>> > >>> fp_code += """ > >>> __asm__ __volatile__ ("" ::: "memory"); > >>> m5_fesetround(oldrnd); > >>> __asm__ __volatile__ ("" ::: "memory"); > >>> """ > >>> fp_code = filterDoubles(fp_code) > >>> iop = InstObjParams(name, Name, 'SparcStaticInst', fp_code, > flags) > >>> header_output = BasicDeclare.subst(iop) > >>> decoder_output = BasicConstructor.subst(iop) > >>> decode_block = BasicDecode.subst(iop) > >>> exec_output = BasicExecute.subst(iop) > >>> }}; > >>> > >>> > >>> On 10/26/11 07:10, Steve Reinhardt wrote: > >>>> I forgot to mention that I fired off a gem5.debug run before I went to > >>> bed > >>>> last night, and it completed successfully. So it does appear to be > the > >>>> optimizer. > >>>> > >>>> Steve > >>>> > >>>> On Wed, Oct 26, 2011 at 12:55 AM, Gabe Black <[email protected]> > >>> wrote: > >>>>> On 10/25/11 22:28, Ali Saidi wrote: > >>>>>> On Tue, 25 Oct 2011 11:53:29 -0700, Gabe Black < > [email protected]> > >>>>>> wrote: > >>>>>>> On 10/25/11 07:46, Steve Reinhardt wrote: > >>>>>>>> On Tue, Oct 25, 2011 at 2:30 AM, Gabe Black < > [email protected]> > >>>>>>>> wrote: > >>>>>>>>> I'm currently building binutils for SPARC, so hopefully I can > >>>>>>>>> disassemble some things and get a better idea of what's going on. > >>> It's > >>>>>>>>> probably going to be really annoying to figure it out. > >>>>>>>> If it's really just an FP rounding error, it might not be that > >>>>>>>> hard... just > >>>>>>>> look at the examples from the trace of where it's going wrong, > >>>>>>>> figure out > >>>>>>>> what the right answer is, and focus on those few instructions. FP > >>>>>>>> is pretty > >>>>>>>> thoroughly specified by IEEE, so if it's not an outright compiler > >>>>>>>> bug, maybe > >>>>>>>> it's just some change in the default rounding settings or > something. > >>>>>>> Yeah, I think ISAs treat IEEE as a really good suggestion rather > than > >>> a > >>>>>>> standard. ARM isn't strictly conformant, and neither is x86. The > >>> default > >>>>>>> rounding mode *is* standard, though, and I don't think is adjusted > in > >>>>>>> SPARC as a result of execution. If it changed somehow (unless I'm > >>>>>>> forgetting where SPARC does that) it's a fairly significant > problem. > >>>>>>> Whether instructions generate +/- 0 in various situations may > depend > >>> on, > >>>>>>> for instance, what order gcc decides to put the operands. I'm not > sure > >>>>>>> that it does, but there are all kinds of weird, subtle behaviors > with > >>>>>>> FP, and you can't just fix how add works if x86 picked the wrong > >>> thing. > >>>>>>> Then you have to replace add, or semi-replace it by faking it out > with > >>>>>>> other FP operations. If we're running real x87 instructions (we > >>>>>>> shouldn't be in 64 bit mode, but we still could) then those use 80 > bit > >>>>>>> operands internally. Where and when rounding takes place depends on > >>> when > >>>>>>> those are moved in/out of the FPU, and will be different than true > 64 > >>>>>>> bit operands. SSE based FP uses real 64 bit doubles, so that should > >>>>>>> behave better. It should also be the default in 64 bit mode since > the > >>>>>>> compiler can assume some basic SSE support is present. > >>>>>> The rounding mode in SPARC is controlled by bits 31:30 of the FSR. > My > >>>>>> guess is that this is actually the problem and gcc 4.5+ is doing > some > >>>>>> code motion that is moving the actual fp code around our setting of > >>>>>> the rounding mode. Using one of the asm tricks to prevent code > >>>>>> movement (supposedly an empty asm() is supposed to be code barrier > in > >>>>>> gcc), might fix the problem. I don't have time to try it, but > >>>>>> src/arch/sparc/isa/formats/basic.isa:145 looks like the right place. > >>>>>> Also, trying to run the regression with m5.debug might see if the > >>>>>> optimizer is at fault. > >>>>>> > >>>>>> Ali > >>>>>> > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> gem5-dev mailing list > >>>>>> [email protected] > >>>>>> http://m5sim.org/mailman/listinfo/gem5-dev > >>>>> Ah, ok, so we do set the mode apparently. I'll try gem5.debug and > also > >>>>> look at that template and see what I can see. Thanks Ali! > >>>>> > >>>>> Gabe > >>>>> _______________________________________________ > >>>>> gem5-dev mailing list > >>>>> [email protected] > >>>>> http://m5sim.org/mailman/listinfo/gem5-dev > >>>>> > >>>> _______________________________________________ > >>>> gem5-dev mailing list > >>>> [email protected] > >>>> http://m5sim.org/mailman/listinfo/gem5-dev > >>> _______________________________________________ > >>> gem5-dev mailing list > >>> [email protected] > >>> http://m5sim.org/mailman/listinfo/gem5-dev > >>> > >> _______________________________________________ > >> gem5-dev mailing list > >> [email protected] > >> http://m5sim.org/mailman/listinfo/gem5-dev > > _______________________________________________ > > gem5-dev mailing list > > [email protected] > > http://m5sim.org/mailman/listinfo/gem5-dev > > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
