Hi Bastiaan,
On Sun, 10 Aug 2025, Bastiaan Braams wrote:
The third loop in the appended code iterates over the 32-bit integers
and at each iteration a simple arithmetic operation is performed (that
cannot be optimized away). The fourth loop iterates over the 32-bit
reals from -huge(0.0_real32) to huge(0.0_real32) by way of the
ieee_next_after function. The timings are reported and again we observe
the factor of about 200 difference. Compiling with `gfortran -O5` I get
3.9 seconds for the third loop and 883 seconds for the fourth loop on my
Intel i7-1165G7.
Very odd.
I have a bit of C code which uses nextafterf to step through every single
REAL*4 (or float in C) from -INFINITY to +INFINITY, and then computes two
exponential functions, one being a single precision routine which needs 2
branches, 3 comparisons and just shy of 30 (super-scalar) multiplications
and additions, and the other being a double/REAL*8 exp() routine from the
system library, and compares them. It takes 46 seconds as a single thread
on a Xeon E5-2650v4 which is 40% slower than your CPU (its technology is 4
years older than yours).
Why your code takes 800 seconds for doing a lot less work is beyond me.
Maybe somebody else can shed light on it? Sorry, I have no experience in
using such routines from Fortran at the level of intricate knowledge to
address your problem. If I have any brilliant ideas, I will let you know.
Regards - Damian