On 07/10/2018 05:09 AM, colinb2 . wrote:
> Feel free to copy this email and attachment to anyone who might be interested.
> I'm very happy to answer any questions anyone has.
> The program can be compiled and run like this on Linux with GNU GCC:
> gcc -O2 -o expmed2.exe expmed2.c
> ./expmed2.exe
> 
> This email deals with making part of the GNU GCC compiler - integer division
> by a constant divisor - faster. (The calculation of the parameters for the
> machine code will be faster; compiled programs won't run faster.)
> Further down I mention inequality (1) which can be used to make the LLVM
> compiler somewhat faster, because that currently uses code based on (2).
> I don't know what - if anything - the Java JVM uses for this, or how other
> compilers do this, but these ideas may be useful there.
> 
> By significantly faster I mean I have benchmarked alternative versions of
> choose_multiplier which on my low specification netbook can take maybe less 
> than
> half the time the current version takes. Time saved in compilation is much 
> less
> important than time saved in running compiled programs, but the code for the
> faster versions is about the same length as the code for the current version,
> and is only a bit more complicated, so is worth considering?
Improvements to compile time behavior are always worth considering.


> 
> Licencing: in Fractint people's words "Don't want money, want recognition!"
> The version choose_multiplier_v2 is based - but improves - on what's in
> the GCC choose_multiplier function in file expmed.c, so the GCC licence.
> The version choose_multiplier_v4 is based - but improves - on magicu2 in
> "Hacker's Delight", so the licence is you needn't acknowledge the source
> but it would be nice to credit the code as from
> magicu2 in "Hacker's Delight" by Henry S Warren http://hackersdelight.org
> with improvements by Colin Bartlett <coli...@gmail.com>.
> This latter also applies to choose_multiplier_power2_divrem because that
> is also an obvious (?) idea from magicu2 in "Hacker's Delight".  */
So a key issue with GCC is that for nontrivial code to be included in
GCC, the code's author must assign their copyright ownership to the FSF.


> 
> The idea is using "wide int" seems slow compared to "unsigned HOST_WIDE_INT",
> so it makes sense to avoid using "wide int" as much as possible.
Sure, but the use of wide_int types has certain advantages and if
possible we'd like that class to be just as efficient as a HOST_WIDE_INT
for common operations on 32 and 64 bit types.

In fact, for 32 and 64 bit types, wide_int is supposed to generate the
same code as HOST_WIDE_INT.  At least that was the state in 2013 when
wide_int was introduced.  So it may be worth spending some time figuring
out why changing the types changes the performance.

If you're going to argue to use HOST_WIDE_INT, then you'll have to think
about how you're going to support 128bit or wider target data types
which is the whole point behind the introduction of wide_int.

Anyway, I think these higher level questions/concerns need to be
addressed before we can reasonably discuss replacing wide_int with
HOST_WIDE_INT.

WRT algorithmic changes.   ISTM it would be best to address those
separately from the wide_int vs HOST_WIDE_INT discussion.  An
algorithmic change should have the same impact regardless of whether
we're using wide_int or HOST_WIDE_INT.




 We can easily
> rewrite choose_multiplier to only use "wide int" to calculate the initial 
> mlow;
> this is choose_multiplier_v2. An alternative for choose_multiplier_v2 
> completely
> avoids using "wide int" by iterating upwards for the initial mlow, but if that
> benchmarks as faster than using "wide int" even once (I suspect it might) then
> just iterating upwards may even be a bit faster; this is choose_multiplier_v4.
So the most obvious take-away from this is to answer the question of
whether or not iterating upwards for the initial mlow is a win or not.
If so, that might be able to go forward independent of any other changes.

> 
> The attachment is self-contained, and I have checked that the values produced
> agree with a "Hacker's Delight" table of M/2^s for small d and n=precision=32.
Rather than attaching new implementations, the generally preferred
packaging is diffs against the trunk.


Jeff

Reply via email to