I forgot to mention that while working on ARM, I did actually look at
the assembly that was generated and gcc was moving things around in less
than helpful ways. You're welcome to look at the assembly if you don't
believe me :-). SPARC is pretty straightforward ISA description wise, so
it should be too difficult to find the responsible code.

Gabe

On 10/27/11 11:30, Gabe Black wrote:
> Exactly this was happening on ARM, and that's why there are all the
> weird __asm__ statements in its instructions. There I had to also
> specify variables as inputs and outputs from the __asm__ statements so
> that other instructions would have to produce their value before or
> consume their value after that point. Here I can't do that since (quite
> reasonably) the code that sets the rounding mode is factored out into a
> common blob, and I don't know what the necessary variables are. There's
> supposed to be some way to prevent this sort of problem, but for gcc
> it's not implemented. I forget exactly how that's supposed to work.
>
> Gabe
>
> On 10/27/11 08:32, Steve Reinhardt wrote:
>> Are you positive this is it?  It does sound very likely that this is the
>> issue, but is there indisputable evidence, like you looked at the
>> disassembly and you can see that things are scheduled in the wrong order?
>>  I'm asking because even though I agree that this seems likely to be the
>> issue, it seems equally unlikely that gcc would reorder operations around
>> function calls like m5_fesetround() (unless they're inlined), and the fact
>> that the asm statements didn't help seems like further evidence that maybe
>> we're not focusing on exactly the right place.
>>
>> Steve
>>
>> On Thu, Oct 27, 2011 at 12:35 AM, Gabe Black <[email protected]> wrote:
>>
>>> I'm convinced we've successfully identified the problem, but
>>> unfortunately I added barriers liberally and it still failed.
>>>
>>> Gabe
>>>
>>>    int newrnd = M5_FE_TONEAREST;
>>>    switch (Fsr<31:30>) {
>>>      case 0: newrnd = M5_FE_TONEAREST; break;
>>>      case 1: newrnd = M5_FE_TOWARDZERO; break;
>>>      case 2: newrnd = M5_FE_UPWARD; break;
>>>      case 3: newrnd = M5_FE_DOWNWARD; break;
>>>    }
>>>    __asm__ __volatile__ ("" ::: "memory");
>>>    int oldrnd = m5_fegetround();
>>>    __asm__ __volatile__ ("" ::: "memory");
>>>    m5_fesetround(newrnd);
>>>    __asm__ __volatile__ ("" ::: "memory");
>>> """
>>>
>>>        fp_code += code
>>>
>>>
>>>        fp_code += """
>>>    __asm__ __volatile__ ("" ::: "memory");
>>>   m5_fesetround(oldrnd);
>>>    __asm__ __volatile__ ("" ::: "memory");
>>> """
>>>        fp_code = filterDoubles(fp_code)
>>>        iop = InstObjParams(name, Name, 'SparcStaticInst', fp_code, flags)
>>>        header_output = BasicDeclare.subst(iop)
>>>        decoder_output = BasicConstructor.subst(iop)
>>>        decode_block = BasicDecode.subst(iop)
>>>        exec_output = BasicExecute.subst(iop)
>>> }};
>>>
>>>
>>> On 10/26/11 07:10, Steve Reinhardt wrote:
>>>> I forgot to mention that I fired off a gem5.debug run before I went to
>>> bed
>>>> last night, and it completed successfully.  So it does appear to be the
>>>> optimizer.
>>>>
>>>> Steve
>>>>
>>>> On Wed, Oct 26, 2011 at 12:55 AM, Gabe Black <[email protected]>
>>> wrote:
>>>>> On 10/25/11 22:28, Ali Saidi wrote:
>>>>>> On Tue, 25 Oct 2011 11:53:29 -0700, Gabe Black <[email protected]>
>>>>>> wrote:
>>>>>>> On 10/25/11 07:46, Steve Reinhardt wrote:
>>>>>>>> On Tue, Oct 25, 2011 at 2:30 AM, Gabe Black <[email protected]>
>>>>>>>> wrote:
>>>>>>>>> I'm currently building binutils for SPARC, so hopefully I can
>>>>>>>>> disassemble some things and get a better idea of what's going on.
>>> It's
>>>>>>>>> probably going to be really annoying to figure it out.
>>>>>>>> If it's really just an FP rounding error, it might not be that
>>>>>>>> hard... just
>>>>>>>> look at the examples from the trace of where it's going wrong,
>>>>>>>> figure out
>>>>>>>> what the right answer is, and focus on those few instructions.  FP
>>>>>>>> is pretty
>>>>>>>> thoroughly specified by IEEE, so if it's not an outright compiler
>>>>>>>> bug, maybe
>>>>>>>> it's just some change in the default rounding settings or something.
>>>>>>> Yeah, I think ISAs treat IEEE as a really good suggestion rather than
>>> a
>>>>>>> standard. ARM isn't strictly conformant, and neither is x86. The
>>> default
>>>>>>> rounding mode *is* standard, though, and I don't think is adjusted in
>>>>>>> SPARC as a result of execution. If it changed somehow (unless I'm
>>>>>>> forgetting where SPARC does that) it's a fairly significant problem.
>>>>>>> Whether instructions generate +/- 0 in various situations may depend
>>> on,
>>>>>>> for instance, what order gcc decides to put the operands. I'm not sure
>>>>>>> that it does, but there are all kinds of weird, subtle behaviors with
>>>>>>> FP, and you can't just fix how add works if x86 picked the wrong
>>> thing.
>>>>>>> Then you have to replace add, or semi-replace it by faking it out with
>>>>>>> other FP operations. If we're running real x87 instructions (we
>>>>>>> shouldn't be in 64 bit mode, but we still could) then those use 80 bit
>>>>>>> operands internally. Where and when rounding takes place depends on
>>> when
>>>>>>> those are moved in/out of the FPU, and will be different than true 64
>>>>>>> bit operands. SSE based FP uses real 64 bit doubles, so that should
>>>>>>> behave better. It should also be the default in 64 bit mode since the
>>>>>>> compiler can assume some basic SSE support is present.
>>>>>> The rounding mode in SPARC is controlled by bits 31:30 of the FSR. My
>>>>>> guess is that this is actually the problem and gcc 4.5+ is doing some
>>>>>> code motion that is moving the actual fp code around our setting of
>>>>>> the rounding mode. Using one of the asm tricks to prevent code
>>>>>> movement (supposedly an empty asm() is supposed to be  code barrier in
>>>>>> gcc), might fix the problem. I don't have time to try it, but
>>>>>> src/arch/sparc/isa/formats/basic.isa:145 looks like the right place.
>>>>>> Also, trying to run the regression with m5.debug might see if the
>>>>>> optimizer is at fault.
>>>>>>
>>>>>> Ali
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> gem5-dev mailing list
>>>>>> [email protected]
>>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>> Ah, ok, so we do set the mode apparently. I'll try gem5.debug and also
>>>>> look at that template and see what I can see. Thanks Ali!
>>>>>
>>>>> Gabe
>>>>> _______________________________________________
>>>>> gem5-dev mailing list
>>>>> [email protected]
>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>>>
>>>> _______________________________________________
>>>> gem5-dev mailing list
>>>> [email protected]
>>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>> _______________________________________________
>>> gem5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>
>> _______________________________________________
>> gem5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/gem5-dev
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to