https://bugs.kde.org/show_bug.cgi?id=429354
--- Comment #3 from Julian Seward <jsew...@acm.org> --- (In reply to Julian Seward from comment #2) > uint16_t getMSBs_8x16(vec128) > { > let hiHalf = vec128[127:64] // LE numbering > let loHalf = vec128[ 63:0] > // In each byte lane, copy the MSB to all bit positions > hiHalf = shift_right_signed_8x8(hiHalf, 7); > loHalf = shift_right_signed_8x8(loHalf, 7); > // Now each byte lane is either 0x00 or 0xFF > // Make (eg) lane 7 contain either 0x00 or 0x80, lane 6 contain > // either 0x00 or 0x40, etc > hiHalf &= 0x8040201008040201; > loHalf &= 0x8040201008040201; > hi8msbs = add_across_lanes_8x8(hiHalf) > lo8msbs = add_across_lanes_8x8(loHalf) > return (hi8msbs << 8) | lo8msbs; > } One more thought, regarding add_across_lanes_8x8(). In fact you can do this semi-reasonably using standard 64-bit scalar code, because of the nature of the values involved. Specifically, we are adding together 8 bytes, each of which is either zero or it has a 1 bit in a different location. Hence there will never be any carry-bit propagation at all in the addition, and so it can be implemented -- for this particular use case only -- as uint64_t add_across_lanes_8x8(uint64_t a) { a += (a >> 8); a += (a >> 16); a += (a >> 32); return a; } (I *think*) -- You are receiving this mail because: You are watching all bug changes.