https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66369

--- Comment #4 from Marcus Kool <marcus.kool at urlfilterdb dot com> ---
> The intrinsic returns "int", and from the above tree dump, the compiler
> won't even consider to combine the sign-extension with vpmovmskb.
> 
> So, why not:
> 
>    unsigned int v;
> 
>    v = (unsigned int) _mm256_movemask_epi8( ... );
>    if (v != 0)
>       return (long) __builtin_ctz( v );

Because that will produce the extra and unnecessary sign extension instructions
if the result is used to index an array of structs.

Can this issue be resolved by simply always letting the intrinsic producing a
64bit result and hence always producing the 64bit instruction 
   vpmovmskb YMM,R64  ?
It well be that a 64bit results is not desired and a conversion to 32bit is
then required but an (implicit) conversion from a long to an int does _not_
need an instruction while conversion from int to long does need an unnecessary
instruction.

Reply via email to