On Oct 31, 2012, at 4:34 PM, Michael Liao <[email protected]> wrote:

> If the backend is fixed, shall we revert this patch to FE? It may
> generate inefficient code.

sqrtss and sqrtsd are not included in the BE fix yet.
After these two are fixed, we should be able to revert the modifications to the 
FE.

Thanks,
Manman

> 
> Yours
> - Michael
> 
> On Wed, 2012-10-31 at 12:45 -0700, manman ren wrote:
>> On Oct 30, 2012, at 4:58 PM, "Liao, Michael" <[email protected]> wrote:
>> 
>>> Thanks. That would be great to model SSE variants of SQRTSS almost the same 
>>> as AVX variants with tied input/output operand.
>> 
>> Committed r167064.
>> 
>> Thanks,
>> Manman
>> 
>>> 
>>> Yours
>>> - Michael
>>> 
>>> -----Original Message-----
>>> From: manman ren [mailto:[email protected]] 
>>> Sent: Tuesday, October 30, 2012 4:52 PM
>>> To: Liao, Michael
>>> Cc: Eli Friedman; [email protected] Commits
>>> Subject: Re: [cfe-commits] r166743 - in /cfe/trunk: lib/Headers/xmmintrin.h 
>>> test/CodeGen/sse-builtins.c
>>> 
>>> 
>>> On Oct 30, 2012, at 4:21 PM, Michael Liao <[email protected]> wrote:
>>> 
>>>> On Tue, 2012-10-30 at 16:16 -0700, manman ren wrote:
>>>>> On Oct 30, 2012, at 4:13 PM, Michael Liao <[email protected]> wrote:
>>>>> 
>>>>>> These unary insns are not modelled accurately to capture the 
>>>>>> hardware behavior (high parts of result is preserved in SSE 
>>>>>> variant). Even though the fix works here, it'd better to fix them in 
>>>>>> X86 backend. Bug is filed in PR14221.
>>>>> I actually have a fix in backend, but thought it is cleaner to fix it 
>>>>> in the front-end :)
>>>> 
>>>> Hmm, I think we'd better fix BE and model them correctly to not only 
>>>> fix this issue (correctness) but also potential performance issue. 
>>>> Without modelling the destination register is an input as well, RA may 
>>>> choose a bad candidate which cause partial register stall issue.
>>> 
>>> The correctness issue only happens with vector inputs, and currently we 
>>> select the vector variant for intrinsics only.
>>> My fix in BE actually will force register allocator to choose the same 
>>> register for both operands.
>>> So it should avoid the partial register stall issue for the vector variant.
>>> Thanks for pointing it out, I will check in my BE fix for rsqrt and rcp.
>>> The definition of builtins for rsqrtss, rcpss will be updated to be the 
>>> same as the intrinsics.
>>> sqrtss and sqrtsd are not included yet.
>>> 
>>> Manman 
>>> 
>>>> 
>>>> Yours
>>>> - Michael
>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Manman
>>>>> 
>>>>>> 
>>>>>> Yours
>>>>>> - Michael
>>>>>> 
>>>>>> On Tue, 2012-10-30 at 10:57 -0700, manman ren wrote:
>>>>>>> On Oct 29, 2012, at 6:01 PM, Eli Friedman <[email protected]> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> On Thu, Oct 25, 2012 at 8:23 PM, Manman Ren <[email protected]> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Sent from my iPhone
>>>>>>>>> 
>>>>>>>>> On Oct 25, 2012, at 7:54 PM, Eli Friedman <[email protected]> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> On Thu, Oct 25, 2012 at 7:35 PM, Manman Ren <[email protected]> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Sent from my iPhone
>>>>>>>>>>> 
>>>>>>>>>>> On Oct 25, 2012, at 6:13 PM, Eli Friedman <[email protected]> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, Oct 25, 2012 at 5:25 PM, Manman Ren <[email protected]> wrote:
>>>>>>>>>>>>> Author: mren
>>>>>>>>>>>>> Date: Thu Oct 25 19:25:10 2012 New Revision: 166743
>>>>>>>>>>>>> 
>>>>>>>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=166743&view=rev
>>>>>>>>>>>>> Log:
>>>>>>>>>>>>> X86 SSE Intrinsics: update header for sqrt_ss, rsqrt_ss and 
>>>>>>>>>>>>> rcp_ss.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> There intrinsics pass through the upper FP values from the input.
>>>>>>>>>>>>> rdar://12558838
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Modified:
>>>>>>>>>>>>> cfe/trunk/lib/Headers/xmmintrin.h 
>>>>>>>>>>>>> cfe/trunk/test/CodeGen/sse-builtins.c
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Modified: cfe/trunk/lib/Headers/xmmintrin.h
>>>>>>>>>>>>> URL: 
>>>>>>>>>>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/xmm
>>>>>>>>>>>>> intrin.h?rev=166743&r1=166742&r2=166743&view=diff
>>>>>>>>>>>>> =============================================================
>>>>>>>>>>>>> =================
>>>>>>>>>>>>> --- cfe/trunk/lib/Headers/xmmintrin.h (original)
>>>>>>>>>>>>> +++ cfe/trunk/lib/Headers/xmmintrin.h Thu Oct 25 19:25:10 
>>>>>>>>>>>>> +++ 2012
>>>>>>>>>>>>> @@ -95,7 +95,8 @@
>>>>>>>>>>>>> static __inline__ __m128 __attribute__((__always_inline__, 
>>>>>>>>>>>>> __nodebug__))
>>>>>>>>>>>>> _mm_sqrt_ss(__m128 a)
>>>>>>>>>>>>> {
>>>>>>>>>>>>> -  return __builtin_ia32_sqrtss(a);
>>>>>>>>>>>>> +  __m128 c = __builtin_ia32_sqrtss(a);  return (__m128) { 
>>>>>>>>>>>>> + c[0], a[1], a[2], a[3] };
>>>>>>>>>>>>> }
>>>>>>>>>>>> 
>>>>>>>>>>>> What does exactly does __builtin_ia32_sqrtss return, if not 
>>>>>>>>>>>> the result of a sqrtss instruction?
>>>>>>>>>>> builtin returns the result of sqrtss instruction, which only 
>>>>>>>>>>> updates the lowest FP. mm_sqrt_ss has the upper FPs pass through to 
>>>>>>>>>>> the output.
>>>>>>>>>> 
>>>>>>>>>> What exactly is in the top 96 bits of the return value of 
>>>>>>>>>> __builtin_ia32_sqrtss, then?
>>>>>>>>> Those bits are undefined with the current implementation, same goes 
>>>>>>>>> for sqrt_sd.
>>>>>>>> 
>>>>>>>> Why are the intrinsics defined to take vectors in the first place, 
>>>>>>>> then?
>>>>>>> Not sure why the builtins for sqrtss, rsqrtss, rcpss and sqrtsd take 
>>>>>>> vectors.
>>>>>>> Their top bits are undefined since they directly map to the 
>>>>>>> corresponding instructions which have the top bits from the 
>>>>>>> un-initialized destination registers.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Manman
>>>>>>>> 
>>>>>>>> -Eli
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> cfe-commits mailing list
>>>>>>> [email protected]
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
> 
> 

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to