Re: [PATCH] Inefficient code generation for 128-bit-≥256-bittypecast intrinsics (BZ #15712)

Craig Topper Sat, 03 Aug 2013 01:10:57 -0700

Option (2) directly matches the capabilities of the shufflevector
instruction in the LLVM IR. I have attached a patch that will allow -1 to
become undef in the IR.


So

__builtin_shufflevector( x, y, 0, 4, -1, 5 );

becomes

shufflevector <4 x float> %x, <4 x float> %y, <4 x i32> <i32 0, i32 4, i32
undef, i32 5>


On Fri, Aug 2, 2013 at 6:15 PM, Katya Romanova <
[email protected]> wrote:

>
>
> Craig Topper <craig.topper@...> writes:
>
> >
> >
> > Ok so -1 isn't valid for indices, and i have even more questions about
> __builtin_shufflevector the more i look at it. See my message in cfe-dev.
> >
> >
> > On Thu, Jul 18, 2013 at 6:12 PM, Chandler Carruth
> <[email protected]> wrote:
> >
> > On Thu, Jul 18, 2013 at 6:11 PM, Craig Topper
> <[email protected]> wrote:
> >
> >
> >
> >
> >
> >
> > Would __builtin_shufflevector(__a, __a, 0, 1, -1, -1)  work?
> >
> >
> >
> >
> >
> > Personally, I would prefer a defined way to produce an undef input in
> general... but if folks are worried about exposing such an interface, then
> sure, we could just allow the shuffle builtin itself to designate an
> "undef"
> input with goofy indices.
> >
> >
> >
> >
> >
> >
> > On Thu, Jul 18, 2013 at 5:42 PM, Chandler Carruth
> <[email protected]> wrote:
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Jul 18, 2013 at 5:32 PM, Katya Romanova
> <[email protected]> wrote:-
>  __m128d __zero = _mm_setzero_pd();
> > -  return __builtin_shufflevector(__a, __zero, 0, 1, 2, 2);
> > +  return (__m256d)__builtin_ia32_pd256_pd((__v2df)__a);
> >
> >
> > I think this is the wrong approach.
> >
> > Rather than switching these to use an x86-specific builtin, instead it
> would be better to provide some generic form to produce an undef input to a
> shufflevector. That is a generally useful and completely target independent
> concept.
> >
> >
> >
> >
> >
> > _______________________________________________
> > cfe-commits mailing listcfe-commits <at>
> cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
> >
> >
> > -- ~Craig
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > -- ~Craig
> >
> >
> >
> > _______________________________________________
> > cfe-commits mailing list
> > cfe-commits@...
> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
> >
>
>
>
> I agree with Chandler that it's better to use a shuffle with undef input
> (which is target independent), even though we generate code for AVX
> intrinsics. The reason I initially ended up using a x86-specific builtin is
> because there I couldn't find a generic way to create "undef" input for a
> shuffle.
>
> I tried the following, but I didn't like it, because the compiler gives a
> warning when compiling avxintrin.h
>
> static __inline __m256d __attribute__((__always_inline__, __nodebug__))
> _mm256_castpd128_pd256(__m128d in)
> {
>   __m128d undef;
>   return __builtin_shufflevector(in, undef, 0, 1, 2, 2);
> }
>
> I tried this as well and I didn't like it either:
>
> static __inline __m256d __attribute__((__always_inline__, __nodebug__))
> _mm256_castpd128_pd256(__m128d in)
> {
>   __v2df __in = (__v2df) in;
>   __v4df ret;
>   ret[0]=in[0];
>   ret[1]=in[1];
>   return (__m256d)ret;
> }
>
> So, I ended up introducing a x86_64 builtin and lowered it later to a
> shuffle with undef (not a target-independent solution).
>
> static __inline __m256d __attribute__((__always_inline__, __nodebug__))
>  _mm256_castpd128_pd256(__m128d __a)  {
>   return (__m256d)__builtin_ia32_pd256_pd((__v2df)__a);
> }
>
>
> I've read Craig's proposal about using shuffle builtin with negative
> indeces
> (-1) to indicate shuffle with undef. This solution looks good. However,
> "-1"
> shuffle index is presently considered invalid. We need to discuss extending
> shuffle syntax/semantics and then implement this extension before I could
> use a shuffle with negative indices for AVX typecast builtins. It looks
> like
> it will take some time...
>
> I was wondering if it's possible to check in my current fix that is using
> x86_86 builtins (instead of a shuffle) for AVX typecast intrinsics for now.
> When shuffle learns to understand negative indices, I could easily replaces
> my changes with something like that:
>
> __builtin_shufflevector(__a, __a, 0, 1, -1, -1)
>
> If this interim solution doesn’t sound inappropriate, we should start a
> discussion about extending shuffle builtin functionality to understand
> negative indexes.
>
> Here are several ideas:
>
> We could use "unary" form of __builtin_shufflevector when negative indices
> are used.
> A "binary" form could be used with negive indexes as well, but semantic
> analysis should ensure that the first and the second parameter is actually
> the same vector. Here is the reason for this limitation:
>
> If negative indices specify "undef" and a binary form of
> __builtin_shufflevector is used with different first and second parameter,
> e.g. __builtin_shufflevector(a, b, 0, 1, 7, -1)
> then, in fact, we will be shuffling 3 vectors (a, b and undef). I don’t
> think that it’s a good idea to extend __builtin_shufflevector semantic to
> do
> that.
>
>
>  Which solution is preferred?
> (1) Support negative indices for unary form of __builtin_shufflevector
> only.
> (2) Support negative indices for binary form of __builtin_shufflevector
> only
> and ensure that the first and the second parameter is the same vector.
> (3) Support both (1) and (2).
> (4) Another possible (though very different from proposed above) solution
> that allows to use "undef" in shuffles would be adding a target-independent
> builtin (e.g __builtin_undef(vector a)), which creates an “undef” vector
> with the same type and the same number of elements as its vector argument.
> With this "undef" builtin, I could implement AVX typecast builtins like
> that:
>
> static __inline __m256d __attribute__((__always_inline__, __nodebug__))
> _mm256_castpd128_pd256(__m128d in)
> {
>   __m128d undef = __builtin_undef(in);
>   return __builtin_shufflevector(in, undef, 0, 1, 2, 2);
> }
>
> Thoughts?
>
>
> Thank you!
> Katya.
>
>
>
> _______________________________________________
> cfe-commits mailing list
> [email protected]
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>



-- 
~Craig

shuffle_undef.patch
Description: Binary data

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH] Inefficient code generation for 128-bit-≥256-bittypecast intrinsics (BZ #15712)

Reply via email to