On Tue, Apr 7, 2020 at 7:51 AM Jeffrey Walton <[email protected]> wrote: > > On Tue, Apr 7, 2020 at 5:51 AM Jeffrey Walton <[email protected]> wrote: > > > > Hi Everyone, > > > > I'm porting a 64-bit algorithm to 32-bit PowerPC (an old PowerMac). > > The algorithm is simple when 64-bit is available, but it gets a little > > ugly under 32-bit. > > ... > > > > Here's what an "add with carry" looks like. The addc simply adds the > > carry into the result after transposing the carry bits from columns 1 > > and 3 to columns 0 and 2. > > > > typedef __vector unsigned char uint8x16_p; > > typedef __vector unsigned int uint32x4_p; > > ... > > > > inline uint32x4_p VecAdd64(const uint32x4_p& vec1, const uint32x4_p& vec2) > > { > > // 64-bit elements available at POWER7 with VSX, but addudm requires > > POWER8 > > #if defined(_ARCH_PWR8) > > return (uint32x4_p)vec_add((uint64x2_p)vec1, (uint64x2_p)vec2); > > #else > > const uint8x16_p cmask = {4,5,6,7, 16,16,16,16, 12,13,14,15, > > 16,16,16,16}; > > const uint32x4_p zero = {0, 0, 0, 0}; > > > > uint32x4_p cy = vec_addc(vec1, vec2); > > cy = vec_perm(cy, zero, cmask); > > return vec_add(vec_add(vec1, vec2), cy); > > #endif > > } > > I think I found it... The compliment of the carry was throwing me off. > Subtract with borrow needs an extra vec_andc to un-compliment the > borrow: > > const uint8x16_p bmask = {4,5,6,7, 16,16,16,16, 12,13,14,15, 16,16,16,16}; > const uint32x4_p amask = {1, 1, 1, 1}; > const uint32x4_p zero = {0, 0, 0, 0}; > > uint32x4_p bw = vec_subc(vec1, vec2); > bw = vec_andc(amask, bw); > bw = vec_perm(bw, zero, bmask); > return vec_sub(vec_sub(vec1, vec2), bw);
Sorry to dig up an old thread... I've been working with Steven Munroe, who is a retired IBM engineer and maintainer of pveclib (https://github.com/munroesj52/pveclib). Munroe recommended avoid the load and permute, and use a shift instead. Here is an updated VecSub64 routine. typedef __vector unsigned int uint32x4_p ; ... #if defined(__BIG_ENDIAN__) const uint32x4_p zero = {0, 0, 0, 0}; const uint32x4_p mask = {0, 1, 0, 1}; #else const uint32x4_p zero = {0, 0, 0, 0}; const uint32x4_p mask = {1, 0, 1, 0}; #endif uint32x4_p bw = vec_subc(vec1, vec2); uint32x4_p res = vec_sub(vec1, vec2); bw = vec_andc(mask, bw); bw = vec_sld (bw, zero, 4); return vec_sub(res, bw); Jeff

