On Wednesday, March 4th, 2026 at 06:18, [email protected] <[email protected]> wrote: >> But... something's niggling me. Is it okay to strip off an odd >> number of lsbits in general? > > Of course it is not. > That's why the proposed code keeps track of parity of the stripped bits: > int off = 0; > for (; UNLIKELY ((lo = *up) == CNST_LIMB (0)); ++up, --usize) > off ^= GMP_NUMB_BITS & 1; > > off is 0 if an even number of bits was skipped, it is 1 if we skipped an odd > number of zeros. > > The two following alternative codes are adapted accordingly, > on the "count_trailing_zeros" side, adjusting cnt with > cnt += cnt+off+1 & 1 > (but I now realize that it might reach GMP_NUMB_BITS, > should one use something more clever? )
Yes, but shifting by GMP_NUMB_BITS < GMP_LIMB_BITS is safe. And GMP_LIMB_BITS is always even, so this is always true if GMP_NUMB_BITS is odd. But yes, I saw and approve of this code... > and shifting the mask on the COUNT_TRAILING_ZEROS_SLOW side, with > lsbit = lsbit*3 & MP_LIMB_T_MAX/3 << 1-off; > > Moreover, before falling back to the square root computation, a limb is added > back, if needed, > to restore parity. > usize += off; > up -= off; Ah! I missed this part, which fixes the error I was thinking of. I was imagining we stripped off the zero limbs and then fell straight through to the square root. Adding back this limb fixes everything. My apologies! >> If count_trailing_zeros(0) is fully nasal-demons Undefined >> (in practice, might divide by 0 or something), then we have >> a problem. But if it returns a value, even an unpredictable >> one, then it can be masked. > > The current "plain C" implementation that we have in longlong.h > loops forever if the operand is zero. > https://gmplib.org/repo/gmp/file/tip/longlong.h#l2256 Good point. I was looking at all the asm implementations and forgot about that one! GCC's __builtin_ctz might do the same. >> Just FYI, I get "403 Forbidden" for this, so I'm not sure what the >> ".1" and ".2" option differences actually are. > > With .1 the argument of the function has a trailing zero limb, with .2 it has > two... > If the code skips them, then the actual computation of the square root is > faster. Thank you! _______________________________________________ gmp-devel mailing list [email protected] https://gmplib.org/mailman/listinfo/gmp-devel
