On 2023-10-10 12:19, Marco van de Voort via fpc-devel wrote:
Op 10-10-2023 om 11:13 schreef J. Gareth Moreton via fpc-devel:
Thanks Tomas,

Nothing is broken, but the timing measurement isn't precise enough.

Normally I have a much higher iteration count (e.g. 1,000,000), but I had reduced it to 10,000 because, coupled with the 1,000 iterations in the subroutines themselves, would have led to 1,000,000,000 passes and hence would take in the region of five to ten minutes to complete for a 16 MHz 386, for example.  Rika's suggestion of running as many iterations as needed until, say, 5 seconds elapses, would help but the timing measurements would cause a lot of latency and will be imprecise on very slow routines.  Still, let's see if 100,000 gives better results for you.

I had the same problem, and now it is stable  Ryzen 5700X (ZEN3)

   Pascal control case: 0.7 ns/call
 Using LEA instruction: 0.4 ns/call
Using ADD instructions: 0.7 ns/call

Indeed, it's much more consistent now, attached a new log for both 32-bit and 64-bit versions from the Intel machine with Windows. Apparently, ADD is still somewhat faster on such "newer" Intel machines (at least if not considering the potential parallelism of LEA discussed previously). I can try this version on my AMD machines later tonight if considered useful - please, let me know which results would be relevant for you in that case (out of the ancient AMD DX4, only slightly less ancient AMD Athlon 1 GHz and the still rather reasonable AMD A9).

Tomas
32-bit version, 10 runs in a row using a command shell for cycle:

   Pascal control case: 0.85 ns/call
 Using LEA instruction: 1.11 ns/call
Using ADD instructions: 0.74 ns/call
   Pascal control case: 0.95 ns/call
 Using LEA instruction: 0.95 ns/call
Using ADD instructions: 0.81 ns/call
   Pascal control case: 0.91 ns/call
 Using LEA instruction: 0.98 ns/call
Using ADD instructions: 0.83 ns/call
   Pascal control case: 0.90 ns/call
 Using LEA instruction: 1.12 ns/call
Using ADD instructions: 0.78 ns/call
   Pascal control case: 0.87 ns/call
 Using LEA instruction: 1.03 ns/call
Using ADD instructions: 0.71 ns/call
   Pascal control case: 0.87 ns/call
 Using LEA instruction: 1.03 ns/call
Using ADD instructions: 0.79 ns/call
   Pascal control case: 0.81 ns/call
 Using LEA instruction: 1.20 ns/call
Using ADD instructions: 0.92 ns/call
   Pascal control case: 0.97 ns/call
 Using LEA instruction: 1.01 ns/call
Using ADD instructions: 0.74 ns/call
   Pascal control case: 0.92 ns/call
 Using LEA instruction: 0.99 ns/call
Using ADD instructions: 0.81 ns/call
   Pascal control case: 0.90 ns/call
 Using LEA instruction: 1.00 ns/call
Using ADD instructions: 0.77 ns/call


64-bit version, 10 runs in a row using a command shell for cycle:

CPU = Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz
-----------------------------------------------
   Pascal control case: 1.04 ns/call
 Using LEA instruction: 1.09 ns/call
Using ADD instructions: 0.82 ns/call
CPU = Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz
-----------------------------------------------
   Pascal control case: 1.07 ns/call
 Using LEA instruction: 1.07 ns/call
Using ADD instructions: 0.71 ns/call
CPU = Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz
-----------------------------------------------
   Pascal control case: 0.98 ns/call
 Using LEA instruction: 1.07 ns/call
Using ADD instructions: 0.80 ns/call
CPU = Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz
-----------------------------------------------
   Pascal control case: 1.11 ns/call
 Using LEA instruction: 1.09 ns/call
Using ADD instructions: 0.75 ns/call
CPU = Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz
-----------------------------------------------
   Pascal control case: 0.98 ns/call
 Using LEA instruction: 1.02 ns/call
Using ADD instructions: 0.78 ns/call
CPU = Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz
-----------------------------------------------
   Pascal control case: 1.09 ns/call
 Using LEA instruction: 1.13 ns/call
Using ADD instructions: 0.69 ns/call
CPU = Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz
-----------------------------------------------
   Pascal control case: 0.98 ns/call
 Using LEA instruction: 1.11 ns/call
Using ADD instructions: 0.81 ns/call
CPU = Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz
-----------------------------------------------
   Pascal control case: 0.95 ns/call
 Using LEA instruction: 1.07 ns/call
Using ADD instructions: 0.71 ns/call
CPU = Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz
-----------------------------------------------
   Pascal control case: 1.04 ns/call
 Using LEA instruction: 1.01 ns/call
Using ADD instructions: 0.70 ns/call
CPU = Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz
-----------------------------------------------
   Pascal control case: 1.05 ns/call
 Using LEA instruction: 0.99 ns/call
Using ADD instructions: 0.71 ns/call
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to