https://bugs.kde.org/show_bug.cgi?id=429354

--- Comment #3 from Julian Seward <jsew...@acm.org> ---
(In reply to Julian Seward from comment #2)

>    uint16_t getMSBs_8x16(vec128)
>    {
>       let hiHalf = vec128[127:64]  // LE numbering
>       let loHalf = vec128[ 63:0]
>       // In each byte lane, copy the MSB to all bit positions
>       hiHalf = shift_right_signed_8x8(hiHalf, 7);
>       loHalf = shift_right_signed_8x8(loHalf, 7);
>       // Now each byte lane is either 0x00 or 0xFF
>       // Make (eg) lane 7 contain either 0x00 or 0x80, lane 6 contain
>       // either 0x00 or 0x40, etc
>       hiHalf &= 0x8040201008040201;
>       loHalf &= 0x8040201008040201;
>       hi8msbs = add_across_lanes_8x8(hiHalf)
>       lo8msbs = add_across_lanes_8x8(loHalf)
>       return (hi8msbs << 8) | lo8msbs;
>    }

One more thought, regarding add_across_lanes_8x8().  In fact you can
do this semi-reasonably using standard 64-bit scalar code, because of the
nature of the values involved.  Specifically, we are adding together 8 bytes,
each of which is either zero or it has a 1 bit in a different location.
Hence there will never be any carry-bit propagation at all in the addition,
and so it can be implemented -- for this particular use case only -- as

   uint64_t add_across_lanes_8x8(uint64_t a) {
      a += (a >> 8);
      a += (a >> 16);
      a += (a >> 32);
      return a;
   }

(I *think*)

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to