Re: [PATCH] Inefficient code generation for 128-bit-≥256-bittypecast intrinsics (BZ #15712)

Craig Topper Sat, 03 Aug 2013 10:53:39 -0700

Commited in r187694. And I'll fix the avxintrin header in just a moment.


On Sat, Aug 3, 2013 at 1:08 AM, Craig Topper <[email protected]> wrote:

> Option (2) directly matches the capabilities of the shufflevector
> instruction in the LLVM IR. I have attached a patch that will allow -1 to
> become undef in the IR.
>
> So
>
> __builtin_shufflevector( x, y, 0, 4, -1, 5 );
>
> becomes
>
> shufflevector <4 x float> %x, <4 x float> %y, <4 x i32> <i32 0, i32 4, i32
> undef, i32 5>
>
>
> On Fri, Aug 2, 2013 at 6:15 PM, Katya Romanova <
> [email protected]> wrote:
>
>>
>>
>> Craig Topper <craig.topper@...> writes:
>>
>> >
>> >
>> > Ok so -1 isn't valid for indices, and i have even more questions about
>> __builtin_shufflevector the more i look at it. See my message in cfe-dev.
>> >
>> >
>> > On Thu, Jul 18, 2013 at 6:12 PM, Chandler Carruth
>> <[email protected]> wrote:
>> >
>> > On Thu, Jul 18, 2013 at 6:11 PM, Craig Topper
>> <[email protected]> wrote:
>> >
>> >
>> >
>> >
>> >
>> >
>> > Would __builtin_shufflevector(__a, __a, 0, 1, -1, -1)  work?
>> >
>> >
>> >
>> >
>> >
>> > Personally, I would prefer a defined way to produce an undef input in
>> general... but if folks are worried about exposing such an interface, then
>> sure, we could just allow the shuffle builtin itself to designate an
>> "undef"
>> input with goofy indices.
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Jul 18, 2013 at 5:42 PM, Chandler Carruth
>> <[email protected]> wrote:
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Jul 18, 2013 at 5:32 PM, Katya Romanova
>> <[email protected]> wrote:-
>>  __m128d __zero = _mm_setzero_pd();
>> > -  return __builtin_shufflevector(__a, __zero, 0, 1, 2, 2);
>> > +  return (__m256d)__builtin_ia32_pd256_pd((__v2df)__a);
>> >
>> >
>> > I think this is the wrong approach.
>> >
>> > Rather than switching these to use an x86-specific builtin, instead it
>> would be better to provide some generic form to produce an undef input to
>> a
>> shufflevector. That is a generally useful and completely target
>> independent
>> concept.
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > cfe-commits mailing listcfe-commits <at>
>> cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>> >
>> >
>> > -- ~Craig
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > -- ~Craig
>> >
>> >
>> >
>> > _______________________________________________
>> > cfe-commits mailing list
>> > cfe-commits@...
>> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>> >
>>
>>
>>
>> I agree with Chandler that it's better to use a shuffle with undef input
>> (which is target independent), even though we generate code for AVX
>> intrinsics. The reason I initially ended up using a x86-specific builtin
>> is
>> because there I couldn't find a generic way to create "undef" input for a
>> shuffle.
>>
>> I tried the following, but I didn't like it, because the compiler gives a
>> warning when compiling avxintrin.h
>>
>> static __inline __m256d __attribute__((__always_inline__, __nodebug__))
>> _mm256_castpd128_pd256(__m128d in)
>> {
>>   __m128d undef;
>>   return __builtin_shufflevector(in, undef, 0, 1, 2, 2);
>> }
>>
>> I tried this as well and I didn't like it either:
>>
>> static __inline __m256d __attribute__((__always_inline__, __nodebug__))
>> _mm256_castpd128_pd256(__m128d in)
>> {
>>   __v2df __in = (__v2df) in;
>>   __v4df ret;
>>   ret[0]=in[0];
>>   ret[1]=in[1];
>>   return (__m256d)ret;
>> }
>>
>> So, I ended up introducing a x86_64 builtin and lowered it later to a
>> shuffle with undef (not a target-independent solution).
>>
>> static __inline __m256d __attribute__((__always_inline__, __nodebug__))
>>  _mm256_castpd128_pd256(__m128d __a)  {
>>   return (__m256d)__builtin_ia32_pd256_pd((__v2df)__a);
>> }
>>
>>
>> I've read Craig's proposal about using shuffle builtin with negative
>> indeces
>> (-1) to indicate shuffle with undef. This solution looks good. However,
>> "-1"
>> shuffle index is presently considered invalid. We need to discuss
>> extending
>> shuffle syntax/semantics and then implement this extension before I could
>> use a shuffle with negative indices for AVX typecast builtins. It looks
>> like
>> it will take some time...
>>
>> I was wondering if it's possible to check in my current fix that is using
>> x86_86 builtins (instead of a shuffle) for AVX typecast intrinsics for
>> now.
>> When shuffle learns to understand negative indices, I could easily
>> replaces
>> my changes with something like that:
>>
>> __builtin_shufflevector(__a, __a, 0, 1, -1, -1)
>>
>> If this interim solution doesn’t sound inappropriate, we should start a
>> discussion about extending shuffle builtin functionality to understand
>> negative indexes.
>>
>> Here are several ideas:
>>
>> We could use "unary" form of __builtin_shufflevector when negative indices
>> are used.
>> A "binary" form could be used with negive indexes as well, but semantic
>> analysis should ensure that the first and the second parameter is actually
>> the same vector. Here is the reason for this limitation:
>>
>> If negative indices specify "undef" and a binary form of
>> __builtin_shufflevector is used with different first and second parameter,
>> e.g. __builtin_shufflevector(a, b, 0, 1, 7, -1)
>> then, in fact, we will be shuffling 3 vectors (a, b and undef). I don’t
>> think that it’s a good idea to extend __builtin_shufflevector semantic to
>> do
>> that.
>>
>>
>>  Which solution is preferred?
>> (1) Support negative indices for unary form of __builtin_shufflevector
>> only.
>> (2) Support negative indices for binary form of __builtin_shufflevector
>> only
>> and ensure that the first and the second parameter is the same vector.
>> (3) Support both (1) and (2).
>> (4) Another possible (though very different from proposed above) solution
>> that allows to use "undef" in shuffles would be adding a
>> target-independent
>> builtin (e.g __builtin_undef(vector a)), which creates an “undef” vector
>> with the same type and the same number of elements as its vector argument.
>> With this "undef" builtin, I could implement AVX typecast builtins like
>> that:
>>
>> static __inline __m256d __attribute__((__always_inline__, __nodebug__))
>> _mm256_castpd128_pd256(__m128d in)
>> {
>>   __m128d undef = __builtin_undef(in);
>>   return __builtin_shufflevector(in, undef, 0, 1, 2, 2);
>> }
>>
>> Thoughts?
>>
>>
>> Thank you!
>> Katya.
>>
>>
>>
>> _______________________________________________
>> cfe-commits mailing list
>> [email protected]
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>
>
>
>
> --
> ~Craig
>



-- 
~Craig

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH] Inefficient code generation for 128-bit-≥256-bittypecast intrinsics (BZ #15712)

Reply via email to