On Thu, Feb 19, 2026 at 08:09:11AM +0800, H.J. Lu wrote:
> bswap: Handle VEC_PACK_TRUNC_EXPR [PR120233]
>
> compiles
>
> void
> foo2 (char* a, short* __restrict b)
> {
> a[0] = b[0] >> 8;
> a[1] = b[0];
> a[2] = b[1] >> 8;
> a[3] = b[1];
> }
>
> into
>
> movl (%rsi), %eax
> bswap %eax
> roll $16, %eax
> movl %eax, (%rdi)
> ret
>
> instead of
>
> movzwl (%rsi), %eax
> movzwl 2(%rsi), %edx
> movl %eax, %ecx
> sall $16, %eax
> sarw $8, %cx
> movzwl %cx, %ecx
> orl %ecx, %eax
> movd %eax, %xmm0
> movl %edx, %eax
> sall $16, %edx
> sarw $8, %ax
> movdqa %xmm0, %xmm2
> movzwl %ax, %eax
> orl %eax, %edx
> movd %edx, %xmm1
> punpcklbw %xmm1, %xmm2
> punpcklbw %xmm1, %xmm0
> pshufd $65, %xmm2, %xmm2
> punpcklbw %xmm2, %xmm0
> movd %xmm0, (%rdi)
> ret
Yeah, I wrote in the commit message that this fixes regression
in one function, but doesn't fix the other regression by the same
commit in the other function, where we used to emit
64-bit bswap and 64-bit rotate and now we emit 2 32-bit bswaps instead.
I think we should try to fix that regression too.
> Update gcc.target/i386/pr108938-3.c to also scan 3 bswaps for x86-64.
>
> PR target/120233
> * gcc.target/i386/pr108938-3.c: Also scan 3 bswaps for x86-64.
>
> I am checking it in.
Though when fixed this change can be just reverted.
Jakub