On Fri, Jul 28, 2023 at 06:03:33PM +0000, Joseph Myers wrote: > You could e.g. have a table up to 10^(N-1) for some N, and 10^N, 10^2N > etc. up to 10^6144 (or rather up to 10^6111, which can then be multiplied > by a 34-digit integer significand), so that only one multiplication is > needed to get the power of 10 and then a second multiplication by the > significand. (Or split into three parts at the cost of an extra > multiplication, or multiply the significand by 1, 10, 100, 1000 or 10000 > as a multiplication within 128 bits and so only need to compute 10^k for k > a multiple of 5, or any number of variations on those themes.)
So, I've done some quick counting, if we want at most one multiplication to get 10^X for X in 0..6111 (plus another to multiply mantissa by that), having one table with 10^1..10^(N-1) and another with 10^YN for Y 1..6111/N, I get for 64-bit limbs S1 - size of 10^1..10^(N-1) table in bytes S2 - size of 10^YN table N S1 S2 S 20 152 388792 388944 32 344 241848 242192 64 1104 121560 122664 128 3896 60144 64040 255 14472 29320 43792 256 14584 29440 44024 266 15704 28032 43736 384 32072 19192 51264 512 56384 14080 70464 where 266 seems to be the minimum, though the difference from 256 is minimal and having N a power of 2 seems cheaper. Though, the above is just counting the bytes of the 64-bit limb arrays concatenated together, I think it will be helpful to have also an unsigned short table with the indexes into the limb array (so another 256*2 + 24*2 bytes). For something not in libgcc_s.so but in libgcc.a I guess 43.5KiB of .rodata might be acceptable to make it fast. Jakub