Re: [PATCH] Inefficient code generation for 128-bit-≥256-bittypecast intrinsics (BZ #15712)

Katya Romanova Fri, 02 Aug 2013 18:23:38 -0700

 

Craig Topper <craig.topper@...> writes:

> 
> 
> Ok so -1 isn't valid for indices, and i have even more questions about
__builtin_shufflevector the more i look at it. See my message in cfe-dev.
> 
> 
> On Thu, Jul 18, 2013 at 6:12 PM, Chandler Carruth
<[email protected]> wrote:
> 
> On Thu, Jul 18, 2013 at 6:11 PM, Craig Topper
<[email protected]> wrote:
> 
> 
> 
> 
> 
> 
> Would __builtin_shufflevector(__a, __a, 0, 1, -1, -1)  work?
> 
> 
> 
> 
> 
> Personally, I would prefer a defined way to produce an undef input in
general... but if folks are worried about exposing such an interface, then
sure, we could just allow the shuffle builtin itself to designate an "undef"
input with goofy indices.
> 
>  
> 
> 
> 
> 
> On Thu, Jul 18, 2013 at 5:42 PM, Chandler Carruth
<[email protected]> wrote:
> 
> 
> 
> 
> 
> 
> 
> On Thu, Jul 18, 2013 at 5:32 PM, Katya Romanova
<[email protected]> wrote:-
 __m128d __zero = _mm_setzero_pd();
> -  return __builtin_shufflevector(__a, __zero, 0, 1, 2, 2);
> +  return (__m256d)__builtin_ia32_pd256_pd((__v2df)__a);
> 
> 
> I think this is the wrong approach.
> 
> Rather than switching these to use an x86-specific builtin, instead it
would be better to provide some generic form to produce an undef input to a
shufflevector. That is a generally useful and completely target independent
concept.
> 
> 
> 
> 
> 
> _______________________________________________
> cfe-commits mailing listcfe-commits <at>
cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
> 
> 
> -- ~Craig
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- ~Craig
> 
> 
> 
> _______________________________________________
> cfe-commits mailing list
> cfe-commits@...
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
> 

I agree with Chandler that it's better to use a shuffle with undef input
(which is target independent), even though we generate code for AVX
intrinsics. The reason I initially ended up using a x86-specific builtin is
because there I couldn't find a generic way to create "undef" input for a
shuffle. 

I tried the following, but I didn't like it, because the compiler gives a
warning when compiling avxintrin.h

static __inline __m256d __attribute__((__always_inline__, __nodebug__))
_mm256_castpd128_pd256(__m128d in)
{
  __m128d undef;
  return __builtin_shufflevector(in, undef, 0, 1, 2, 2);
}

I tried this as well and I didn't like it either:

static __inline __m256d __attribute__((__always_inline__, __nodebug__))
_mm256_castpd128_pd256(__m128d in)
{
  __v2df __in = (__v2df) in;
  __v4df ret;
  ret[0]=in[0];
  ret[1]=in[1];
  return (__m256d)ret;
}

So, I ended up introducing a x86_64 builtin and lowered it later to a
shuffle with undef (not a target-independent solution).

static __inline __m256d __attribute__((__always_inline__, __nodebug__)) 
 _mm256_castpd128_pd256(__m128d __a)  {
  return (__m256d)__builtin_ia32_pd256_pd((__v2df)__a);
}

I've read Craig's proposal about using shuffle builtin with negative indeces
(-1) to indicate shuffle with undef. This solution looks good. However, "-1"
shuffle index is presently considered invalid. We need to discuss extending
shuffle syntax/semantics and then implement this extension before I could
use a shuffle with negative indices for AVX typecast builtins. It looks like
it will take some time...

I was wondering if it's possible to check in my current fix that is using
x86_86 builtins (instead of a shuffle) for AVX typecast intrinsics for now.
When shuffle learns to understand negative indices, I could easily replaces
my changes with something like that: 

__builtin_shufflevector(__a, __a, 0, 1, -1, -1)

If this interim solution doesn’t sound inappropriate, we should start a
discussion about extending shuffle builtin functionality to understand
negative indexes.

Here are several ideas:

We could use "unary" form of __builtin_shufflevector when negative indices
are used.
A "binary" form could be used with negive indexes as well, but semantic
analysis should ensure that the first and the second parameter is actually
the same vector. Here is the reason for this limitation:

If negative indices specify "undef" and a binary form of
__builtin_shufflevector is used with different first and second parameter, 
e.g. __builtin_shufflevector(a, b, 0, 1, 7, -1)
then, in fact, we will be shuffling 3 vectors (a, b and undef). I don’t
think that it’s a good idea to extend __builtin_shufflevector semantic to do
that.

 Which solution is preferred? 
(1) Support negative indices for unary form of __builtin_shufflevector only.
(2) Support negative indices for binary form of __builtin_shufflevector only
and ensure that the first and the second parameter is the same vector.
(3) Support both (1) and (2).
(4) Another possible (though very different from proposed above) solution
that allows to use "undef" in shuffles would be adding a target-independent
builtin (e.g __builtin_undef(vector a)), which creates an “undef” vector
with the same type and the same number of elements as its vector argument.
With this "undef" builtin, I could implement AVX typecast builtins like that:

static __inline __m256d __attribute__((__always_inline__, __nodebug__))
_mm256_castpd128_pd256(__m128d in)
{
  __m128d undef = __builtin_undef(in);
  return __builtin_shufflevector(in, undef, 0, 1, 2, 2);
}

Thoughts?

Thank you!
Katya. 

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH] Inefficient code generation for 128-bit-≥256-bittypecast intrinsics (BZ #15712)

Reply via email to