https://bugs.llvm.org/show_bug.cgi?id=42550

            Bug ID: 42550
           Summary: Byte+shift loads are not autovectorized to movdqu on
                    SSE4.1
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected], [email protected],
                    [email protected], [email protected]

Take the following code:

#ifdef BYTE_SHIFT
static uint32_t read32(uint8_t const *data, size_t offset)
{
    return (uint32_t) data[offset + 0]
        | ((uint32_t) data[offset + 1] << 8)
        | ((uint32_t) data[offset + 2] << 16)
        | ((uint32_t) data[offset + 3] << 24);
}
#else
static uint32_t read32(uint8_t const *data, size_t offset)
{
    uint32_t ret;
    memcpy(&ret, data + offset, sizeof(float));
    return ret;
}
#endif

Both ways are perfectly valid ways to perform an unaligned load, and when SSE
is disabled, they generate the same code when used. 

However, when SSE4 is enabled and loops using these loads are autovectorized,
instead of a movdqu which is what memcpy outputs, the byte shift is expanded
literally into a bunch of pslld, pinsrb, and pmovzxbd instructions.

Demo: https://godbolt.org/z/jCAm2o

These types of loads should be converted to movdqu as well.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to