On Fri, Jul 28, 2023 at 06:03:33PM +0000, Joseph Myers wrote:
> You could e.g. have a table up to 10^(N-1) for some N, and 10^N, 10^2N 
> etc. up to 10^6144 (or rather up to 10^6111, which can then be multiplied 
> by a 34-digit integer significand), so that only one multiplication is 
> needed to get the power of 10 and then a second multiplication by the 
> significand.  (Or split into three parts at the cost of an extra 
> multiplication, or multiply the significand by 1, 10, 100, 1000 or 10000 
> as a multiplication within 128 bits and so only need to compute 10^k for k 
> a multiple of 5, or any number of variations on those themes.)

So, I've done some quick counting, if we want at most one multiplication
to get 10^X for X in 0..6111 (plus another to multiply mantissa by that),
having one table with 10^1..10^(N-1) and another with 10^YN for Y 1..6111/N,
I get for 64-bit limbs
S1 - size of 10^1..10^(N-1) table in bytes
S2 - size of 10^YN table
N       S1      S2      S
20      152     388792  388944
32      344     241848  242192
64      1104    121560  122664
128     3896    60144   64040
255     14472   29320   43792
256     14584   29440   44024
266     15704   28032   43736
384     32072   19192   51264
512     56384   14080   70464
where 266 seems to be the minimum, though the difference from 256 is minimal
and having N a power of 2 seems cheaper.  Though, the above is just counting
the bytes of the 64-bit limb arrays concatenated together, I think it will
be helpful to have also an unsigned short table with the indexes into the
limb array (so another 256*2 + 24*2 bytes).
For something not in libgcc_s.so but in libgcc.a I guess 43.5KiB of .rodata
might be acceptable to make it fast.

        Jakub

Reply via email to