On Fri, Jul 28, 2023 at 06:03:33PM +0000, Joseph Myers wrote:
> You could e.g. have a table up to 10^(N-1) for some N, and 10^N, 10^2N
> etc. up to 10^6144 (or rather up to 10^6111, which can then be multiplied
> by a 34-digit integer significand), so that only one multiplication is
> needed to get the power of 10 and then a second multiplication by the
> significand. (Or split into three parts at the cost of an extra
> multiplication, or multiply the significand by 1, 10, 100, 1000 or 10000
> as a multiplication within 128 bits and so only need to compute 10^k for k
> a multiple of 5, or any number of variations on those themes.)
So, I've done some quick counting, if we want at most one multiplication
to get 10^X for X in 0..6111 (plus another to multiply mantissa by that),
having one table with 10^1..10^(N-1) and another with 10^YN for Y 1..6111/N,
I get for 64-bit limbs
S1 - size of 10^1..10^(N-1) table in bytes
S2 - size of 10^YN table
N S1 S2 S
20 152 388792 388944
32 344 241848 242192
64 1104 121560 122664
128 3896 60144 64040
255 14472 29320 43792
256 14584 29440 44024
266 15704 28032 43736
384 32072 19192 51264
512 56384 14080 70464
where 266 seems to be the minimum, though the difference from 256 is minimal
and having N a power of 2 seems cheaper. Though, the above is just counting
the bytes of the 64-bit limb arrays concatenated together, I think it will
be helpful to have also an unsigned short table with the indexes into the
limb array (so another 256*2 + 24*2 bytes).
For something not in libgcc_s.so but in libgcc.a I guess 43.5KiB of .rodata
might be acceptable to make it fast.
Jakub