Are you positive this is it?  It does sound very likely that this is the
issue, but is there indisputable evidence, like you looked at the
disassembly and you can see that things are scheduled in the wrong order?
 I'm asking because even though I agree that this seems likely to be the
issue, it seems equally unlikely that gcc would reorder operations around
function calls like m5_fesetround() (unless they're inlined), and the fact
that the asm statements didn't help seems like further evidence that maybe
we're not focusing on exactly the right place.

Steve

On Thu, Oct 27, 2011 at 12:35 AM, Gabe Black <[email protected]> wrote:

> I'm convinced we've successfully identified the problem, but
> unfortunately I added barriers liberally and it still failed.
>
> Gabe
>
>    int newrnd = M5_FE_TONEAREST;
>    switch (Fsr<31:30>) {
>      case 0: newrnd = M5_FE_TONEAREST; break;
>      case 1: newrnd = M5_FE_TOWARDZERO; break;
>      case 2: newrnd = M5_FE_UPWARD; break;
>      case 3: newrnd = M5_FE_DOWNWARD; break;
>    }
>    __asm__ __volatile__ ("" ::: "memory");
>    int oldrnd = m5_fegetround();
>    __asm__ __volatile__ ("" ::: "memory");
>    m5_fesetround(newrnd);
>    __asm__ __volatile__ ("" ::: "memory");
> """
>
>        fp_code += code
>
>
>        fp_code += """
>    __asm__ __volatile__ ("" ::: "memory");
>   m5_fesetround(oldrnd);
>    __asm__ __volatile__ ("" ::: "memory");
> """
>        fp_code = filterDoubles(fp_code)
>        iop = InstObjParams(name, Name, 'SparcStaticInst', fp_code, flags)
>        header_output = BasicDeclare.subst(iop)
>        decoder_output = BasicConstructor.subst(iop)
>        decode_block = BasicDecode.subst(iop)
>        exec_output = BasicExecute.subst(iop)
> }};
>
>
> On 10/26/11 07:10, Steve Reinhardt wrote:
> > I forgot to mention that I fired off a gem5.debug run before I went to
> bed
> > last night, and it completed successfully.  So it does appear to be the
> > optimizer.
> >
> > Steve
> >
> > On Wed, Oct 26, 2011 at 12:55 AM, Gabe Black <[email protected]>
> wrote:
> >
> >> On 10/25/11 22:28, Ali Saidi wrote:
> >>> On Tue, 25 Oct 2011 11:53:29 -0700, Gabe Black <[email protected]>
> >>> wrote:
> >>>> On 10/25/11 07:46, Steve Reinhardt wrote:
> >>>>> On Tue, Oct 25, 2011 at 2:30 AM, Gabe Black <[email protected]>
> >>>>> wrote:
> >>>>>> I'm currently building binutils for SPARC, so hopefully I can
> >>>>>> disassemble some things and get a better idea of what's going on.
> It's
> >>>>>> probably going to be really annoying to figure it out.
> >>>>> If it's really just an FP rounding error, it might not be that
> >>>>> hard... just
> >>>>> look at the examples from the trace of where it's going wrong,
> >>>>> figure out
> >>>>> what the right answer is, and focus on those few instructions.  FP
> >>>>> is pretty
> >>>>> thoroughly specified by IEEE, so if it's not an outright compiler
> >>>>> bug, maybe
> >>>>> it's just some change in the default rounding settings or something.
> >>>> Yeah, I think ISAs treat IEEE as a really good suggestion rather than
> a
> >>>> standard. ARM isn't strictly conformant, and neither is x86. The
> default
> >>>> rounding mode *is* standard, though, and I don't think is adjusted in
> >>>> SPARC as a result of execution. If it changed somehow (unless I'm
> >>>> forgetting where SPARC does that) it's a fairly significant problem.
> >>>> Whether instructions generate +/- 0 in various situations may depend
> on,
> >>>> for instance, what order gcc decides to put the operands. I'm not sure
> >>>> that it does, but there are all kinds of weird, subtle behaviors with
> >>>> FP, and you can't just fix how add works if x86 picked the wrong
> thing.
> >>>> Then you have to replace add, or semi-replace it by faking it out with
> >>>> other FP operations. If we're running real x87 instructions (we
> >>>> shouldn't be in 64 bit mode, but we still could) then those use 80 bit
> >>>> operands internally. Where and when rounding takes place depends on
> when
> >>>> those are moved in/out of the FPU, and will be different than true 64
> >>>> bit operands. SSE based FP uses real 64 bit doubles, so that should
> >>>> behave better. It should also be the default in 64 bit mode since the
> >>>> compiler can assume some basic SSE support is present.
> >>> The rounding mode in SPARC is controlled by bits 31:30 of the FSR. My
> >>> guess is that this is actually the problem and gcc 4.5+ is doing some
> >>> code motion that is moving the actual fp code around our setting of
> >>> the rounding mode. Using one of the asm tricks to prevent code
> >>> movement (supposedly an empty asm() is supposed to be  code barrier in
> >>> gcc), might fix the problem. I don't have time to try it, but
> >>> src/arch/sparc/isa/formats/basic.isa:145 looks like the right place.
> >>> Also, trying to run the regression with m5.debug might see if the
> >>> optimizer is at fault.
> >>>
> >>> Ali
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> gem5-dev mailing list
> >>> [email protected]
> >>> http://m5sim.org/mailman/listinfo/gem5-dev
> >> Ah, ok, so we do set the mode apparently. I'll try gem5.debug and also
> >> look at that template and see what I can see. Thanks Ali!
> >>
> >> Gabe
> >> _______________________________________________
> >> gem5-dev mailing list
> >> [email protected]
> >> http://m5sim.org/mailman/listinfo/gem5-dev
> >>
> > _______________________________________________
> > gem5-dev mailing list
> > [email protected]
> > http://m5sim.org/mailman/listinfo/gem5-dev
>
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
>
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to