Am 03.02.19 um 06:26 schrieb J. Gareth Moreton:
Hi everyone,

So I'm looking to improve some of the mathematical routines.  However, not all of them are internal functions and are stored in the Math unit..  Some of them are written in assembly language but use the old floating-point stack, or use a slow hack when there's a good alternative available in SSE 4.1, for example, and I would like to see about rewriting some of these functions for x86_64.  However, while I can safely assume the presence of SSE2 on this architecture, what's the best way to detect if "-iCOREAVX" etc are specified?  Also, if "-iCOREAVX", does it automatically set "-fAVX" as well?  I rather make sure I'm not making incorrect assumptions before I start writing assembly language routines.

As an example of a function that can benefit from a speed-up under x86_64... the floor() and floor64() functions:

function floor64(x: float): Int64;
   begin
     Result:=Trunc(x)-ord(Frac(x)<0);
   end;

For time-critical code, this is not ideal because, besides being a function itself, it calls Trunc, Frac, has a subtraction, and another implicit subtraction and assignment due to the condition.  Under SSE4.1, this could be optimised to something like the following:

Better make it inline, detect the node pattern and then generate the right instructions depending on the fpu switches. While this is still a "micro" optimization, it has its maximum benefit and does not clutter rtl units with assembler and user code using similar sequences benefit from it as well.
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to