If the backend is fixed, shall we revert this patch to FE? It may generate inefficient code.
Yours - Michael On Wed, 2012-10-31 at 12:45 -0700, manman ren wrote: > On Oct 30, 2012, at 4:58 PM, "Liao, Michael" <[email protected]> wrote: > > > Thanks. That would be great to model SSE variants of SQRTSS almost the same > > as AVX variants with tied input/output operand. > > Committed r167064. > > Thanks, > Manman > > > > > Yours > > - Michael > > > > -----Original Message----- > > From: manman ren [mailto:[email protected]] > > Sent: Tuesday, October 30, 2012 4:52 PM > > To: Liao, Michael > > Cc: Eli Friedman; [email protected] Commits > > Subject: Re: [cfe-commits] r166743 - in /cfe/trunk: lib/Headers/xmmintrin.h > > test/CodeGen/sse-builtins.c > > > > > > On Oct 30, 2012, at 4:21 PM, Michael Liao <[email protected]> wrote: > > > >> On Tue, 2012-10-30 at 16:16 -0700, manman ren wrote: > >>> On Oct 30, 2012, at 4:13 PM, Michael Liao <[email protected]> wrote: > >>> > >>>> These unary insns are not modelled accurately to capture the > >>>> hardware behavior (high parts of result is preserved in SSE > >>>> variant). Even though the fix works here, it'd better to fix them in > >>>> X86 backend. Bug is filed in PR14221. > >>> I actually have a fix in backend, but thought it is cleaner to fix it > >>> in the front-end :) > >> > >> Hmm, I think we'd better fix BE and model them correctly to not only > >> fix this issue (correctness) but also potential performance issue. > >> Without modelling the destination register is an input as well, RA may > >> choose a bad candidate which cause partial register stall issue. > > > > The correctness issue only happens with vector inputs, and currently we > > select the vector variant for intrinsics only. > > My fix in BE actually will force register allocator to choose the same > > register for both operands. > > So it should avoid the partial register stall issue for the vector variant. > > Thanks for pointing it out, I will check in my BE fix for rsqrt and rcp. > > The definition of builtins for rsqrtss, rcpss will be updated to be the > > same as the intrinsics. > > sqrtss and sqrtsd are not included yet. > > > > Manman > > > >> > >> Yours > >> - Michael > >> > >>> > >>> Thanks, > >>> Manman > >>> > >>>> > >>>> Yours > >>>> - Michael > >>>> > >>>> On Tue, 2012-10-30 at 10:57 -0700, manman ren wrote: > >>>>> On Oct 29, 2012, at 6:01 PM, Eli Friedman <[email protected]> > >>>>> wrote: > >>>>> > >>>>>> On Thu, Oct 25, 2012 at 8:23 PM, Manman Ren <[email protected]> wrote: > >>>>>>> > >>>>>>> > >>>>>>> Sent from my iPhone > >>>>>>> > >>>>>>> On Oct 25, 2012, at 7:54 PM, Eli Friedman <[email protected]> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> On Thu, Oct 25, 2012 at 7:35 PM, Manman Ren <[email protected]> wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Sent from my iPhone > >>>>>>>>> > >>>>>>>>> On Oct 25, 2012, at 6:13 PM, Eli Friedman <[email protected]> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> On Thu, Oct 25, 2012 at 5:25 PM, Manman Ren <[email protected]> wrote: > >>>>>>>>>>> Author: mren > >>>>>>>>>>> Date: Thu Oct 25 19:25:10 2012 New Revision: 166743 > >>>>>>>>>>> > >>>>>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=166743&view=rev > >>>>>>>>>>> Log: > >>>>>>>>>>> X86 SSE Intrinsics: update header for sqrt_ss, rsqrt_ss and > >>>>>>>>>>> rcp_ss. > >>>>>>>>>>> > >>>>>>>>>>> There intrinsics pass through the upper FP values from the input. > >>>>>>>>>>> rdar://12558838 > >>>>>>>>>>> > >>>>>>>>>>> Modified: > >>>>>>>>>>> cfe/trunk/lib/Headers/xmmintrin.h > >>>>>>>>>>> cfe/trunk/test/CodeGen/sse-builtins.c > >>>>>>>>>>> > >>>>>>>>>>> Modified: cfe/trunk/lib/Headers/xmmintrin.h > >>>>>>>>>>> URL: > >>>>>>>>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/xmm > >>>>>>>>>>> intrin.h?rev=166743&r1=166742&r2=166743&view=diff > >>>>>>>>>>> ============================================================= > >>>>>>>>>>> ================= > >>>>>>>>>>> --- cfe/trunk/lib/Headers/xmmintrin.h (original) > >>>>>>>>>>> +++ cfe/trunk/lib/Headers/xmmintrin.h Thu Oct 25 19:25:10 > >>>>>>>>>>> +++ 2012 > >>>>>>>>>>> @@ -95,7 +95,8 @@ > >>>>>>>>>>> static __inline__ __m128 __attribute__((__always_inline__, > >>>>>>>>>>> __nodebug__)) > >>>>>>>>>>> _mm_sqrt_ss(__m128 a) > >>>>>>>>>>> { > >>>>>>>>>>> - return __builtin_ia32_sqrtss(a); > >>>>>>>>>>> + __m128 c = __builtin_ia32_sqrtss(a); return (__m128) { > >>>>>>>>>>> + c[0], a[1], a[2], a[3] }; > >>>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> What does exactly does __builtin_ia32_sqrtss return, if not > >>>>>>>>>> the result of a sqrtss instruction? > >>>>>>>>> builtin returns the result of sqrtss instruction, which only > >>>>>>>>> updates the lowest FP. mm_sqrt_ss has the upper FPs pass through to > >>>>>>>>> the output. > >>>>>>>> > >>>>>>>> What exactly is in the top 96 bits of the return value of > >>>>>>>> __builtin_ia32_sqrtss, then? > >>>>>>> Those bits are undefined with the current implementation, same goes > >>>>>>> for sqrt_sd. > >>>>>> > >>>>>> Why are the intrinsics defined to take vectors in the first place, > >>>>>> then? > >>>>> Not sure why the builtins for sqrtss, rsqrtss, rcpss and sqrtsd take > >>>>> vectors. > >>>>> Their top bits are undefined since they directly map to the > >>>>> corresponding instructions which have the top bits from the > >>>>> un-initialized destination registers. > >>>>> > >>>>> Thanks, > >>>>> Manman > >>>>>> > >>>>>> -Eli > >>>>> > >>>>> _______________________________________________ > >>>>> cfe-commits mailing list > >>>>> [email protected] > >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits > >>>> > >>>> > >>> > >> > >> > > > _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
