https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021
Roger Sayle <roger at nextmovesoftware dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|unassigned at gcc dot gnu.org |roger at nextmovesoftware dot com Last reconfirmed| |2024-05-10 CC| |roger at nextmovesoftware dot com Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #1 from Roger Sayle <roger at nextmovesoftware dot com> --- I have a patch for x86 ternlog handling that changes the output for this testcase (without the pending change to optimize V8QI shifts) to: foo: movl $67372036, %eax vpsraw $5, %xmm0, %xmm0 vpbroadcastd %eax, %xmm1 vpternlogd $108, .LC0(%rip), %xmm1, %xmm0 vpsubb %xmm1, %xmm0, %xmm0 ret .align 16 .LC0: .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 which at least doesn't construct the vector with a broadcast, and then "spill" it to the stack before reading it back from memory. I've no idea if this is optimal, but it's certainly better than the current "spill". I'm curious about what has changed to make this code (register allocation) regress since GCC 13. It was a patch of mine that changed broadcastb to broadcastd, but that shouldn't have affected reload/register preferencing.