| Issue |
97694
|
| Summary |
[AVX512] Replace x86_avx512_mask_pmov_* register intrinsics with generic expansion
|
| Labels |
backend:X86
|
| Assignees |
|
| Reporter |
RKSimon
|
We already expand basic truncation intrinsics to the trunc+shuffle sequence:
```cpp
static __inline__ __m128i __DEFAULT_FN_ATTRS128
_mm_cvtepi64_epi8 (__m128i __A)
{
return (__m128i)__builtin_shufflevector(
__builtin_convertvector((__v2di)__A, __v2qi), (__v2qi){0, 0}, 0, 1, 2, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3);
}
```
But the predicate variants are not, as its tricky to correctly merge the predicate select of the lower elements and the zeroing of the upper elements:
```
static __inline__ __m128i __DEFAULT_FN_ATTRS128
_mm_mask_cvtepi64_epi8 (__m128i __O, __mmask8 __M, __m128i __A)
{
return (__m128i) __builtin_ia32_pmovqb128_mask ((__v2di) __A,
(__v16qi) __O, __M);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS128
_mm_maskz_cvtepi64_epi8 (__mmask8 __M, __m128i __A)
{
return (__m128i) __builtin_ia32_pmovqb128_mask ((__v2di) __A,
(__v16qi) _mm_setzero_si128 (),
__M);
}
```
So this is likely to require some improvements to the DAG backend as well as clang frontend
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs