It's certainly possible, but feels a little finnicky, since floor64 is not an internal function unlike, say, the trigonometric functions. It will also break if the original code is changed. It feels like a kludge, especially if another programmer down the line tries to rewrite the function and is suddenly confused when the execution speed turns out slower because the node pattern is no longer identical.
The intention though was to put the improved code, with pre-processor directives to detect the FPU switches, in the platform-specific include file and wrap the original procedure in a "{$ifndef FPC_MATH_HAS_FLOOR64}", similar to how other functions in the Math unit are programmed (e.g. DivMod). To reassure, I'm aware that "float" is normally "extended" outside of x86_64, and I would keep my changes constrained to that platform. Regarding Trunc, I'm aware that it's just "cvttsd2si %xmm0,%rax", but being assembly language, it's currently impossible to inline. Admittedly this is something I would like to develop and implement at some point, the ability to inline at least simple assembler routines where temporary registers can be replaced with virtual registers and the compiler can detect registers that map onto parameters and return values - very platform-specific though, but since "inline" is just ignored if it can't be used, it won't be an erroneous situation. Gareth aka. Kit P.S. Documentation specifically states that the Floor function round towards negative infinity, unlike Trunc that rounds towards zero. On Sun 03/02/19 13:11 , Florian Klämpfl flor...@freepascal.org sent: Am 03.02.19 um 06:26 schrieb J. Gareth Moreton: > Hi everyone, > > So I'm looking to improve some of the mathematical routines. However, > not all of them are internal functions and are stored in the Math > unit.. Some of them are written in assembly language but use the old > floating-point stack, or use a slow hack when there's a good alternative > available in SSE 4.1, for example, and I would like to see about > rewriting some of these functions for x86_64. However, while I can > safely assume the presence of SSE2 on this architecture, what's the best > way to detect if "-iCOREAVX" etc are specified? Also, if "-iCOREAVX", > does it automatically set "-fAVX" as well? I rather make sure I'm not > making incorrect assumptions before I start writing assembly language > routines. > > As an example of a function that can benefit from a speed-up under > x86_64... the floor() and floor64() functions: > > function floor64(x: float): Int64; > begin > Result:=Trunc(x)-ord(Frac(x) end; > > For time-critical code, this is not ideal because, besides being a > function itself, it calls Trunc, Frac, has a subtraction, and another > implicit subtraction and assignment due to the condition. Under SSE4.1, > this could be optimised to something like the following: Better make it inline, detect the node pattern and then generate the right instructions depending on the fpu switches. While this is still a "micro" optimization, it has its maximum benefit and does not clutter rtl units with assembler and user code using similar sequences benefit from it as well. _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org [1] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel [2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel Links: ------ [1] mailto:fpc-devel@lists.freepascal.org [2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel