Re: [fpc-pascal] FPC Graphics options?

2017-05-20 Thread Ryan Joseph

> On May 21, 2017, at 2:34 AM, Jonas Maebe  wrote:
> The Pascal test program that was benchmarked here contains a number of 
> bugs/wrong translations from the C code (some stem from the original version, 
> another one was added):

Thanks for looking this over. I’m personally a little worried when I see this 
kind of thing because I don’t know the causes and how it affects my code. 
Despite all the noise I think we finally got down to bed rock though. 
Unfortunately as a person who doesn’t understand compilers well all I can 
conclude from this is to avoid floating point math in tight loops. That’s 
probably not accurate enough but that’s the only way I can understand it right 
now.

What I’m hearing is there are some bad C translations and some missing FPC 
features. Not sure what percent is translations and what is FPC but I think 
it’s mainly on the side of the compiler.

> 
> Then, there's one thing that can be done to optimize the Pascal version 
> (after removing the bugs above):
> 1) Compile with SSE3 or higher, in particular because SSE3 can be used to 
> implement trunc() with a single instruction (otherwise we pass via a helper 
> that uses the x87 fpu, which moreover has to reconfigure it to change the 
> rounding more and restore it afterwards). However, there does seem to be a 
> bug in FPC 3.0.2 whereby compiling this program for -O2 -Cfsse3 causes it to 
> crash, because then it loads data from an 8-byte aligned location on the 
> stack. It works fine when compiled with trunk and -O2 -Cfsse3 though (at 
> least for 64 bit).

I just compiled with ppcx64 3.1.1 (from 3.0.2) and went from 8fps to 22fps 
without optimizations and 28fpc with (I got some divide by zero errors but 
that’s just translations). What is that about? What changed?

Just curious, why isn’t -Cfsse3 always enabled in optimizations? It seems like 
we want this on always.

> 
> There's at least one minor twist of the classic "C compiler evaluates 
> constant stuff at compile time":
> 1) oy and oz are constant. The "floor" function is a standard C library 
> function, and hence C compilers know what it does and can evaluate it at 
> compile time. Therefore, the oy-floor(oy) and oz-floor(oz) expressions are 
> (equal) constants for C compilers.

How are those constants? I see them defined as "float oy = 32.5;” in the c 
version.


Regards,
Ryan Joseph

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] FPC Graphics options?

2017-05-20 Thread Sven Barth via fpc-pascal
On 20.05.2017 21:34, Jonas Maebe wrote:
> There's at least one minor twist of the classic "C compiler evaluates
> constant stuff at compile time":
> 1) oy and oz are constant. The "floor" function is a standard C library
> function, and hence C compilers know what it does and can evaluate it at
> compile time. Therefore, the oy-floor(oy) and oz-floor(oz) expressions
> are (equal) constants for C compilers.

Would it help here if we'd declare suitable overloads for Floor() for
the various floating point types instead of only the "Float" one,
declare them as inline and have the inline nodes for Frac() and Trunc()
handle constant values?
At least if the compiler also recognizes that oy and oz are constant...

Regards,
Sven
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] FPC Graphics options?

2017-05-20 Thread Florian Klämpfl
Am 20.05.2017 um 21:34 schrieb Jonas Maebe:
> Also in summary, very little was learned from this. We have known for a long 
> time that FPC needs SSA
> for better code generation for loops (and Florian has been working on it for 
> a long time too).

Actually, this is not completely true :) What FPC needs to generate better code 
in this case (on SYS
V ABI targets), is life splitting around call nodes. This needs no SSA/SSA 
might actually not help.
I have a patch for it, but not finished, as another patch is needed for this to 
make it work well:
spill coalescing (nodes/registers which are spilled, are spilled to the same 
memory location if they
are not interfering but connected by a move). I have also a half backed patch 
for this, but never
finished it nor committed it to the official trunk. Both patches combined 
result for the example in
much better code regarding register usage as variables can go to xmm registers 
which are
stored/restored around call nodes.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] FPC Graphics options?

2017-05-20 Thread Jonas Maebe

On 19/05/17 02:54, Ryan Joseph wrote:

On May 18, 2017, at 10:40 PM, Jon Foster  wrote:

62.44  1.33 1.33 fpc_frac_real
26.76  1.90 0.57 MATH_$$_FLOOR$EXTENDED$$LONGINT
10.33  2.12 0.22 FPC_DIV_INT64

Thanks for profiling this.

Floor is there as I expected and 26% is pretty extreme but the others are 
floating point division?
How does Java handle this so much better than FPC and what are the work arounds?
The Pascal test program that was benchmarked here contains a number of 
bugs/wrong translations from the C code (some stem from the original 
version, another one was added):
1) casting a floating point number to an int in C does not round, but 
truncates (I think this may have been mentioned earlier in the thread, I 
didn't read everything)
2) The usage of floor in the test program is wrong. C's floor takes a 
floating point number and returns one. The math unit's floor function 
takes a floating point number and returns an integer. In the Pascal 
version, this integer is then converted back to a floating point number 
because the rest of that expression also uses floating point.
3) The Pascal version uses longword instead of int32 for a number of 
variables (that are "int" in the C version). This results in one 
expression getting evaluated as 64 bit on 32 bit systems, which is where 
the FPC_DIV_INT64 calls come from (that's a routine to perform 64 bit 
*integer* divisions on 32 bit platforms)
4) frac() is only used to get a monotonous increasing value as part of 
the data input for the test program. The C code (and original Pascal 
version) uses a tick count and multiplies/divides that, which is much 
faster.


Then, there's one thing that can be done to optimize the Pascal version 
(after removing the bugs above):
1) Compile with SSE3 or higher, in particular because SSE3 can be used 
to implement trunc() with a single instruction (otherwise we pass via a 
helper that uses the x87 fpu, which moreover has to reconfigure it to 
change the rounding more and restore it afterwards). However, there does 
seem to be a bug in FPC 3.0.2 whereby compiling this program for -O2 
-Cfsse3 causes it to crash, because then it loads data from an 8-byte 
aligned location on the stack. It works fine when compiled with trunk 
and -O2 -Cfsse3 though (at least for 64 bit).


There's at least one minor twist of the classic "C compiler evaluates 
constant stuff at compile time":
1) oy and oz are constant. The "floor" function is a standard C library 
function, and hence C compilers know what it does and can evaluate it at 
compile time. Therefore, the oy-floor(oy) and oz-floor(oz) expressions 
are (equal) constants for C compilers.


Finally, there are two things FPC definitely is missing:
1) an SSE version of the int() function (which is the basis of a 
floating point version of floor()) (fairly specific to this program)
2) SSA support in loops (to make better use of SSE registers; related to 
Florian's note about the calling conventions). However, without the 
previous changes, even FPC code compiled to LLVM IR and then compiled to 
machine code with Clang (and hence with full SSA support) results in 
even worse performance than the code directly compiled with FPC.


There are definitely more things (as I did not manage to get FPC's LLVM 
IR to compile to a version that's equally fast as the LLVM IR generated 
from the C program), but I already spent more time than is reasonable on 
this. I hope the "the sky is falling" comments will stop though.


In summary, as has been mentioned by several people in this thread: you 
(not directed have to you personally, Ryan) always have to check where 
your program's slowness comes from, otherwise your test/benchmark is 
worse than useless (because it just creates confusion, and wastes other 
people's time when they get tired of mailing list getting flooded by the 
same information-less statements over and over again).


Also in summary, very little was learned from this. We have known for a 
long time that FPC needs SSA for better code generation for loops (and 
Florian has been working on it for a long time too).



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Ignoring function results

2017-05-20 Thread Mark Morgan Lloyd

On 20/05/17 12:30, Bart wrote:

On 5/20/17, Mark Morgan Lloyd  wrote:

According to the Programmer's Guide 1.3.41, {$EXTENDEDSYNTAX OFF} has> the 
effect of permitting the result of a function to be ignored.

Isn't that just the other way around?
"Extended syntax allows you to drop the result of a function. Thismeans that you can 
use a function call as if it were a procedure.By default this feature is on. You can 
switch it off using the {$X-}or {$EXTENDEDSYNTAX OFF}directive."


Just a mo, let me have another shot at that in case I was doing 
something stupid...


it's definitely got to be on for optional parameters to be accepted, and 
that appears to be the default state if {$mode objfpc}{$H+} is at the 
top of the unit.


The curious thing is that in the cold light of day I can't get 
$EXTENDEDSYNTAX to have any effect on the function result. I'll admit 
what I'm doing:


operator <= (var a: TDateTimeArray; const s: TDateTime): boolean;

begin
  result := Length(a) > 0;
  SetLength(a, Length(a) + 1);
  a[High(a)] := s
end { <= } ;

operator + (const a: TDateTimeArray; const s: TDateTime): TDateTimeArray;

var b: boolean;

begin
  result := a;
  if Length(result) = 0 then
{ b := } result <= s
  else
result[High(result)] += s
end { + } ;

If I uncomment the boolean assignment it works. Where I appeared to be 
last night was that setting $EXTENDEDSYNTAX OFF had the above working, 
but I'm now having trouble duplicating it. And I hadn't touched a drop :-)


--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] FPC Graphics options?

2017-05-20 Thread Nikolay Nikolov



On 05/19/2017 06:13 PM, Jon Foster wrote:


On 05/19/2017 04:11 AM, Nikolay Nikolov wrote:



On 05/19/2017 03:54 AM, Ryan Joseph wrote:
On May 18, 2017, at 10:40 PM, Jon Foster 
 wrote:


62.44  1.33 1.33 fpc_frac_real
26.76  1.90 0.57 MATH_$$_FLOOR$EXTENDED$$LONGINT
10.33  2.12 0.22 FPC_DIV_INT64

Thanks for profiling this.

Floor is there as I expected and 26% is pretty extreme but the 
others are floating point division? How does Java handle this so 
much better than FPC and what are the work arounds? Just curious. As 
it stands I can only reason that I need to avoid dividing floats in 
FPC like the plague.
[...] The default options for the i386 compiler is to target the 
Pentium CPU, which does not have SSE. This gives most compatibility 
and least performance, but that's what's appropriate for most users, 
because for most desktop applications, CPU speed is no longer an 
issue. Only very specific tasks, such as software 3D rendering need 
high CPU performance, and people doing that stuff, usually know very 
well their compiler options and how to enable support for modern 
instruction extensions for maximum performance. Of course, people 
coming from a Java background might not be used at all to having to 
do this kind of stuff, but it's really not that hard.


As stated I tried *ALL* of the FPU settings and received the same 
result or an "access violation", which I assumed meant my FPU did not 
support that feature set.
Access violation means usually accessing memory, which is way out of 
bounds. You can try turning range and overflow checking on, but there's 
no guarantee it is going to catch it. However, you should try to narrow 
it down to find the offending location. It could be a bug in your code, 
or a bug in the code generator (which produces an invalid result from a 
given calculation).
I even tried to enable emulation, to see what the difference would be, 
but ppc386 said it was an invalid switch even though it lists it in 
the help output.
Emulation is only supported under go32v2 (the 32-bit DOS target) and is 
only needed on 486SX and 386 CPUs without an FPU, so it's very unlikely 
you would need it. 486DX and above all have a built-in FPU and need no 
emulation. And newer instruction set extensions such as SSE2 and SSE3 
are never emulated, because emulation usually defeats the purpose of 
your code being faster. However, it is very likely that your CPU has 
SSE2 and SSE3 support, unless it is very ancient. Btw, what CPU do you have?


Nikolay
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Ignoring function results

2017-05-20 Thread Bart
On 5/20/17, Mark Morgan Lloyd  wrote:

> According to the Programmer's Guide 1.3.41, {$EXTENDEDSYNTAX OFF} has
> the effect of permitting the result of a function to be ignored.

Isn't that just the other way around?

"Extended syntax allows you to drop the result of a function. This
means that you can use a function call as if it were a procedure.
By default this feature is on. You can switch it off using the {$X-}
or {$EXTENDEDSYNTAX OFF}directive."

Bart
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] FPC Graphics options?

2017-05-20 Thread Jon Foster


On 05/19/2017 04:11 AM, Nikolay Nikolov wrote:



On 05/19/2017 03:54 AM, Ryan Joseph wrote:
On May 18, 2017, at 10:40 PM, Jon Foster 
 wrote:


62.44  1.33 1.33 fpc_frac_real
26.76  1.90 0.57 MATH_$$_FLOOR$EXTENDED$$LONGINT
10.33  2.12 0.22 FPC_DIV_INT64

Thanks for profiling this.

Floor is there as I expected and 26% is pretty extreme but the others 
are floating point division? How does Java handle this so much better 
than FPC and what are the work arounds? Just curious. As it stands I can 
only reason that I need to avoid dividing floats in FPC like the plague.
[...] The default options for the i386 compiler is to target the Pentium 
CPU, which does not have SSE. This gives most compatibility and least 
performance, but that's what's appropriate for most users, because for 
most desktop applications, CPU speed is no longer an issue. Only very 
specific tasks, such as software 3D rendering need high CPU performance, 
and people doing that stuff, usually know very well their compiler 
options and how to enable support for modern instruction extensions for 
maximum performance. Of course, people coming from a Java background 
might not be used at all to having to do this kind of stuff, but it's 
really not that hard.


As stated I tried *ALL* of the FPU settings and received the same result or 
an "access violation", which I assumed meant my FPU did not support that 
feature set. I even tried to enable emulation, to see what the difference 
would be, but ppc386 said it was an invalid switch even though it lists it 
in the help output.


--
Sent from my Debian Linux laptop -- http://www.debian.org/intro/about

Jon Foster
JF Possibilities, Inc.
j...@jfpossibilities.com
541-410-2760
Making computers work for you!

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Best way to check SimpleIPC for messages

2017-05-20 Thread Michael Schnell

On 17.05.2017 07:08, nore...@z505.com wrote:
 what happens when the application is not idle, but sort of idle? 


A new Queue event also only is serviced when no other previous events 
are peresent hence when the application gets "idle".


I don't know when exactly "OnIdle" is called. It can't be in a closed 
loop otherwise any application would always use 100% CPU.


Hence "OnIdle" is bound to work with an even greater latency than a 
decent queue entry like TThread.Queue or Application.QueueAsyncCall.


-Michael
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal