Craig Topper <craig.topper@...> writes:
> > > Ok so -1 isn't valid for indices, and i have even more questions about __builtin_shufflevector the more i look at it. See my message in cfe-dev. > > > On Thu, Jul 18, 2013 at 6:12 PM, Chandler Carruth <[email protected]> wrote: > > On Thu, Jul 18, 2013 at 6:11 PM, Craig Topper <[email protected]> wrote: > > > > > > > Would __builtin_shufflevector(__a, __a, 0, 1, -1, -1) work? > > > > > > Personally, I would prefer a defined way to produce an undef input in general... but if folks are worried about exposing such an interface, then sure, we could just allow the shuffle builtin itself to designate an "undef" input with goofy indices. > > > > > > > On Thu, Jul 18, 2013 at 5:42 PM, Chandler Carruth <[email protected]> wrote: > > > > > > > > On Thu, Jul 18, 2013 at 5:32 PM, Katya Romanova <[email protected]> wrote:- __m128d __zero = _mm_setzero_pd(); > - return __builtin_shufflevector(__a, __zero, 0, 1, 2, 2); > + return (__m256d)__builtin_ia32_pd256_pd((__v2df)__a); > > > I think this is the wrong approach. > > Rather than switching these to use an x86-specific builtin, instead it would be better to provide some generic form to produce an undef input to a shufflevector. That is a generally useful and completely target independent concept. > > > > > > _______________________________________________ > cfe-commits mailing listcfe-commits <at> cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits > > > -- ~Craig > > > > > > > > > > > -- ~Craig > > > > _______________________________________________ > cfe-commits mailing list > cfe-commits@... > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits > I agree with Chandler that it's better to use a shuffle with undef input (which is target independent), even though we generate code for AVX intrinsics. The reason I initially ended up using a x86-specific builtin is because there I couldn't find a generic way to create "undef" input for a shuffle. I tried the following, but I didn't like it, because the compiler gives a warning when compiling avxintrin.h static __inline __m256d __attribute__((__always_inline__, __nodebug__)) _mm256_castpd128_pd256(__m128d in) { __m128d undef; return __builtin_shufflevector(in, undef, 0, 1, 2, 2); } I tried this as well and I didn't like it either: static __inline __m256d __attribute__((__always_inline__, __nodebug__)) _mm256_castpd128_pd256(__m128d in) { __v2df __in = (__v2df) in; __v4df ret; ret[0]=in[0]; ret[1]=in[1]; return (__m256d)ret; } So, I ended up introducing a x86_64 builtin and lowered it later to a shuffle with undef (not a target-independent solution). static __inline __m256d __attribute__((__always_inline__, __nodebug__)) _mm256_castpd128_pd256(__m128d __a) { return (__m256d)__builtin_ia32_pd256_pd((__v2df)__a); } I've read Craig's proposal about using shuffle builtin with negative indeces (-1) to indicate shuffle with undef. This solution looks good. However, "-1" shuffle index is presently considered invalid. We need to discuss extending shuffle syntax/semantics and then implement this extension before I could use a shuffle with negative indices for AVX typecast builtins. It looks like it will take some time... I was wondering if it's possible to check in my current fix that is using x86_86 builtins (instead of a shuffle) for AVX typecast intrinsics for now. When shuffle learns to understand negative indices, I could easily replaces my changes with something like that: __builtin_shufflevector(__a, __a, 0, 1, -1, -1) If this interim solution doesn’t sound inappropriate, we should start a discussion about extending shuffle builtin functionality to understand negative indexes. Here are several ideas: We could use "unary" form of __builtin_shufflevector when negative indices are used. A "binary" form could be used with negive indexes as well, but semantic analysis should ensure that the first and the second parameter is actually the same vector. Here is the reason for this limitation: If negative indices specify "undef" and a binary form of __builtin_shufflevector is used with different first and second parameter, e.g. __builtin_shufflevector(a, b, 0, 1, 7, -1) then, in fact, we will be shuffling 3 vectors (a, b and undef). I don’t think that it’s a good idea to extend __builtin_shufflevector semantic to do that. Which solution is preferred? (1) Support negative indices for unary form of __builtin_shufflevector only. (2) Support negative indices for binary form of __builtin_shufflevector only and ensure that the first and the second parameter is the same vector. (3) Support both (1) and (2). (4) Another possible (though very different from proposed above) solution that allows to use "undef" in shuffles would be adding a target-independent builtin (e.g __builtin_undef(vector a)), which creates an “undef” vector with the same type and the same number of elements as its vector argument. With this "undef" builtin, I could implement AVX typecast builtins like that: static __inline __m256d __attribute__((__always_inline__, __nodebug__)) _mm256_castpd128_pd256(__m128d in) { __m128d undef = __builtin_undef(in); return __builtin_shufflevector(in, undef, 0, 1, 2, 2); } Thoughts? Thank you! Katya. _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
