On Oct 30, 2012, at 4:58 PM, "Liao, Michael" <[email protected]> wrote:
> Thanks. That would be great to model SSE variants of SQRTSS almost the same > as AVX variants with tied input/output operand. Committed r167064. Thanks, Manman > > Yours > - Michael > > -----Original Message----- > From: manman ren [mailto:[email protected]] > Sent: Tuesday, October 30, 2012 4:52 PM > To: Liao, Michael > Cc: Eli Friedman; [email protected] Commits > Subject: Re: [cfe-commits] r166743 - in /cfe/trunk: lib/Headers/xmmintrin.h > test/CodeGen/sse-builtins.c > > > On Oct 30, 2012, at 4:21 PM, Michael Liao <[email protected]> wrote: > >> On Tue, 2012-10-30 at 16:16 -0700, manman ren wrote: >>> On Oct 30, 2012, at 4:13 PM, Michael Liao <[email protected]> wrote: >>> >>>> These unary insns are not modelled accurately to capture the >>>> hardware behavior (high parts of result is preserved in SSE >>>> variant). Even though the fix works here, it'd better to fix them in >>>> X86 backend. Bug is filed in PR14221. >>> I actually have a fix in backend, but thought it is cleaner to fix it >>> in the front-end :) >> >> Hmm, I think we'd better fix BE and model them correctly to not only >> fix this issue (correctness) but also potential performance issue. >> Without modelling the destination register is an input as well, RA may >> choose a bad candidate which cause partial register stall issue. > > The correctness issue only happens with vector inputs, and currently we > select the vector variant for intrinsics only. > My fix in BE actually will force register allocator to choose the same > register for both operands. > So it should avoid the partial register stall issue for the vector variant. > Thanks for pointing it out, I will check in my BE fix for rsqrt and rcp. > The definition of builtins for rsqrtss, rcpss will be updated to be the same > as the intrinsics. > sqrtss and sqrtsd are not included yet. > > Manman > >> >> Yours >> - Michael >> >>> >>> Thanks, >>> Manman >>> >>>> >>>> Yours >>>> - Michael >>>> >>>> On Tue, 2012-10-30 at 10:57 -0700, manman ren wrote: >>>>> On Oct 29, 2012, at 6:01 PM, Eli Friedman <[email protected]> wrote: >>>>> >>>>>> On Thu, Oct 25, 2012 at 8:23 PM, Manman Ren <[email protected]> wrote: >>>>>>> >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Oct 25, 2012, at 7:54 PM, Eli Friedman <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> On Thu, Oct 25, 2012 at 7:35 PM, Manman Ren <[email protected]> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Sent from my iPhone >>>>>>>>> >>>>>>>>> On Oct 25, 2012, at 6:13 PM, Eli Friedman <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> On Thu, Oct 25, 2012 at 5:25 PM, Manman Ren <[email protected]> wrote: >>>>>>>>>>> Author: mren >>>>>>>>>>> Date: Thu Oct 25 19:25:10 2012 New Revision: 166743 >>>>>>>>>>> >>>>>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=166743&view=rev >>>>>>>>>>> Log: >>>>>>>>>>> X86 SSE Intrinsics: update header for sqrt_ss, rsqrt_ss and rcp_ss. >>>>>>>>>>> >>>>>>>>>>> There intrinsics pass through the upper FP values from the input. >>>>>>>>>>> rdar://12558838 >>>>>>>>>>> >>>>>>>>>>> Modified: >>>>>>>>>>> cfe/trunk/lib/Headers/xmmintrin.h >>>>>>>>>>> cfe/trunk/test/CodeGen/sse-builtins.c >>>>>>>>>>> >>>>>>>>>>> Modified: cfe/trunk/lib/Headers/xmmintrin.h >>>>>>>>>>> URL: >>>>>>>>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/xmm >>>>>>>>>>> intrin.h?rev=166743&r1=166742&r2=166743&view=diff >>>>>>>>>>> ============================================================= >>>>>>>>>>> ================= >>>>>>>>>>> --- cfe/trunk/lib/Headers/xmmintrin.h (original) >>>>>>>>>>> +++ cfe/trunk/lib/Headers/xmmintrin.h Thu Oct 25 19:25:10 >>>>>>>>>>> +++ 2012 >>>>>>>>>>> @@ -95,7 +95,8 @@ >>>>>>>>>>> static __inline__ __m128 __attribute__((__always_inline__, >>>>>>>>>>> __nodebug__)) >>>>>>>>>>> _mm_sqrt_ss(__m128 a) >>>>>>>>>>> { >>>>>>>>>>> - return __builtin_ia32_sqrtss(a); >>>>>>>>>>> + __m128 c = __builtin_ia32_sqrtss(a); return (__m128) { >>>>>>>>>>> + c[0], a[1], a[2], a[3] }; >>>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> What does exactly does __builtin_ia32_sqrtss return, if not >>>>>>>>>> the result of a sqrtss instruction? >>>>>>>>> builtin returns the result of sqrtss instruction, which only updates >>>>>>>>> the lowest FP. mm_sqrt_ss has the upper FPs pass through to the >>>>>>>>> output. >>>>>>>> >>>>>>>> What exactly is in the top 96 bits of the return value of >>>>>>>> __builtin_ia32_sqrtss, then? >>>>>>> Those bits are undefined with the current implementation, same goes for >>>>>>> sqrt_sd. >>>>>> >>>>>> Why are the intrinsics defined to take vectors in the first place, then? >>>>> Not sure why the builtins for sqrtss, rsqrtss, rcpss and sqrtsd take >>>>> vectors. >>>>> Their top bits are undefined since they directly map to the corresponding >>>>> instructions which have the top bits from the un-initialized destination >>>>> registers. >>>>> >>>>> Thanks, >>>>> Manman >>>>>> >>>>>> -Eli >>>>> >>>>> _______________________________________________ >>>>> cfe-commits mailing list >>>>> [email protected] >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits >>>> >>>> >>> >> >> > _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
