On 05/19/2017 02:11 PM, Nikolay Nikolov wrote:
On 05/19/2017 03:54 AM, Ryan Joseph wrote:
On May 18, 2017, at 10:40 PM, Jon Foster
<jon-li...@jfpossibilities.com> wrote:
62.44 1.33 1.33 fpc_frac_real
26.76 1.90 0.57 MATH_$$_FLOOR$EXTENDED$$LONGINT
10.33 2.12 0.22 FPC_DIV_INT64
Thanks for profiling this.
Floor is there as I expected and 26% is pretty extreme but the others
are floating point division? How does Java handle this so much better
than FPC and what are the work arounds? Just curious. As it stands I
can only reason that I need to avoid dividing floats in FPC like the
plague.
Java is a JVM, which generates bytecode, which isn't CPU specific and
comes with a JIT compiler, which compiles the bytecode to native code,
when the program is run, so it can always make use of the instruction
set, supported by the CPU you're using. But, of course, launching the
application becomes much slower. In FPC, if you want to use SSE and
avoid the x87 FPU, you have to compile with a specific compiler
options and forfeit the option for your executable to run on non-SSE
capable CPUs, because FPC generates native code. If you want to keep
compatibility and support modern instruction set extensions, you need
to compile different executables for different instruction sets and
make a launcher .exe, which detects the CPU type and runs the
appropriate executable. The default options for the i386 compiler is
to target the Pentium CPU, which does not have SSE. This gives most
compatibility and least performance, but that's what's appropriate for
most users, because for most desktop applications, CPU speed is no
longer an issue. Only very specific tasks, such as software 3D
rendering need high CPU performance, and people doing that stuff,
usually know very well their compiler options and how to enable
support for modern instruction extensions for maximum performance. Of
course, people coming from a Java background might not be used at all
to having to do this kind of stuff, but it's really not that hard.
With all that said, I'm not saying that FPC still doesn't have room for
optimization, only the difference shown shouldn't be this huge, if you
use the capabilities of modern CPUs. fpc_frac_real is slow on modern
CPUs, because it uses slow x87 code, instead of SSE. FPC_DIV_INT64 is
slow, because it does 64-bit division on 32-bit CPUs, using an algorithm
that does use only 32-bit instructions. The fact that this procedure is
a bottleneck in your code means that your code will benefit immensely if
compiled for x86_64, which has a native 64-bit division instruction.
Nikolay
_______________________________________________
fpc-pascal maillist - fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal