https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |roger at 
nextmovesoftware dot com
   Last reconfirmed|                            |2024-05-10
                 CC|                            |roger at nextmovesoftware dot 
com
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Roger Sayle <roger at nextmovesoftware dot com> ---
I have a patch for x86 ternlog handling that changes the output for this
testcase (without the pending change to optimize V8QI shifts) to:
foo:    movl    $67372036, %eax
        vpsraw  $5, %xmm0, %xmm0
        vpbroadcastd    %eax, %xmm1
        vpternlogd      $108, .LC0(%rip), %xmm1, %xmm0
        vpsubb  %xmm1, %xmm0, %xmm0
        ret
        .align 16
.LC0:
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7

which at least doesn't construct the vector with a broadcast, and then "spill"
it to the stack before reading it back from memory.   I've no idea if this is
optimal, but it's certainly better than the current "spill".

I'm curious about what has changed to make this code (register allocation)
regress since GCC 13.  It was a patch of mine that changed broadcastb to
broadcastd, but that shouldn't have affected reload/register preferencing.

Reply via email to