https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88027

--- Comment #3 from acsawdey at gcc dot gnu.org ---
This appears to have to do with alignment. In this test case,
expand_block_clear() sees alignment of only 8 bits for the pointer p. If you
declare a local struct st and pass that to __builtin_memset, it sees alignment
of 128 bits and generates 4 stxv or stvx.

There is a bug here though:

  for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes)
    {
      machine_mode mode = BLKmode;
      rtx dest;

      if (TARGET_ALTIVEC
          && ((bytes >= 16 && align >= 128)
              || (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX)))

The intention here was to only do unaligned VSX if there were at least 32 bytes
to clear. However because bytes is decremented, what this actually does is to
always do the last 16 bytes using std if it is unaligned. This doesn't make a
lot of sense and would be an easy fix.

Reply via email to