Well, I think you might be right on this one, Jonas!
I've tested my algorithm against the one used in the compiler. It's 5
times faster when used with small divisors (so loop iterations are
minimal)... but that amounts to about 15 nanoseconds compared to 75
nanoseconds! Additionally, it treats the "magic add" differently (the
reason why it randomly failed).
Gareth aka. Kit
On 16/10/2020 23:14, J. Gareth Moreton via fpc-devel wrote:
On 16/10/2020 10:47, Jonas Maebe via fpc-devel wrote:
On 16/10/2020 10:14, J. Gareth Moreton via fpc-devel wrote:
Before I go optimising the wrong thing, I have a question to ask.
What's the policy on platform-specific assembly language in the
compiler, or any code designed to run on a specific (source) platform
(and using a more generic implementation otherwise via $ifdef)? I ask
because I have a faster algorithm for "calc_divconst_magic_unsigned" in
'compiler/cgutils.pas', but it's only able to work because it can take
advantage of the x86 DIV instruction using RDX:RAX (or EDX:EAX) as a
double-wide dividend. It is somewhat faster than what currently exists
because of the lack of a loop whose iteration count is proportional to
log2(d), where d is the desired divisor (in other words, it's slower
the
bigger the divisor is, whereas my algorithm is constant speed).
In general, there should be no assembly language in the compiler. Ialso
don't think that's worth it in this case. Unless (or maybe "even if")
your code contains nothing but divisions by constants, I doubt this code
has a significant effect on the total compile time.
Division by constants has a fairly frequent occurrance in code. For
example, dividing by 10000 whenever Currency is used, and 1000 often
appears in timing measurements (e.g. if t is in milliseconds, then t
div 1000 is seconds and t mod 1000 is the leftover milliseconds).
The existing code contains two divisions by a variable (so they can't
be optimised) and a loop that has, at most, N iterations, where N is
the bit size (often 32 or 64). The loop contains only addition,
subtraction and multiplication, and 3 branches to contend with (not
including the repeat...until jump). My code contains a single DIV,
but also a BSR which is effectively used to get the base-2 logarithm
of the divisor (also throws an internal error if the divisor is zero,
since this should have been caught already).
Granted, you may be right and the saving won't be worth it, not to
mention the additional complexity (and my function currently fails on
certain divisors unexpectedly, so I'll have to do some deeper testing
if just for my own peace of mind!) - only a lot of timing tests will
determine that. Nevertheless, thanks for providing the article to
calculating the reciprocal though - that's definitely helpful in
understanding what's going on.
Gareth aka. Kit
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel