Re: [fpc-devel] Policy on platform-specific compiler code

J. Gareth Moreton via fpc-devel Sat, 17 Oct 2020 17:13:57 -0700

Well, I think you might be right on this one, Jonas!

I've tested my algorithm against the one used in the compiler. It's 5times faster when used with small divisors (so loop iterations areminimal)... but that amounts to about 15 nanoseconds compared to 75nanoseconds! Additionally, it treats the "magic add" differently (thereason why it randomly failed).


Gareth aka. Kit

On 16/10/2020 23:14, J. Gareth Moreton via fpc-devel wrote:

On 16/10/2020 10:47, Jonas Maebe via fpc-devel wrote:
On 16/10/2020 10:14, J. Gareth Moreton via fpc-devel wrote:
Before I go optimising the wrong thing, I have a question to ask.
What's the policy on platform-specific assembly language in the
compiler, or any code designed to run on a specific (source) platform
(and using a more generic implementation otherwise via $ifdef)?  I ask
because I have a faster algorithm for "calc_divconst_magic_unsigned" in
'compiler/cgutils.pas', but it's only able to work because it can take
advantage of the x86 DIV instruction using RDX:RAX (or EDX:EAX) as a
double-wide dividend. It is somewhat faster than what currently exists
because of the lack of a loop whose iteration count is proportional to
log2(d), where d is the desired divisor (in other words, it's slowerthe
bigger the divisor is, whereas my algorithm is constant speed).
In general, there should be no assembly language in the compiler. Ialso
  don't think that's worth it in this case. Unless (or maybe "even if")
your code contains nothing but divisions by constants, I doubt this code
has a significant effect on the total compile time.
Division by constants has a fairly frequent occurrance in code. Forexample, dividing by 10000 whenever Currency is used, and 1000 oftenappears in timing measurements (e.g. if t is in milliseconds, then tdiv 1000 is seconds and t mod 1000 is the leftover milliseconds).
The existing code contains two divisions by a variable (so they can'tbe optimised) and a loop that has, at most, N iterations, where N isthe bit size (often 32 or 64). The loop contains only addition,subtraction and multiplication, and 3 branches to contend with (notincluding the repeat...until jump). My code contains a single DIV,but also a BSR which is effectively used to get the base-2 logarithmof the divisor (also throws an internal error if the divisor is zero,since this should have been caught already).
Granted, you may be right and the saving won't be worth it, not tomention the additional complexity (and my function currently fails oncertain divisors unexpectedly, so I'll have to do some deeper testingif just for my own peace of mind!) - only a lot of timing tests willdetermine that. Nevertheless, thanks for providing the article tocalculating the reciprocal though - that's definitely helpful inunderstanding what's going on.
Gareth aka. Kit

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] Policy on platform-specific compiler code

Reply via email to