[fpc-devel] DWARF CIEs and FDEs on Linux x86_64

2018-02-04 Thread Markus Beth
Hi, I am using FPC (3.0.4 and the fixes_3_0 branch) to create a shared library for Linux x86_64. When it comes to debugging (with gdb) or profiling, I always had problems getting valuable stacktraces from withing the pascal functions. I now tracked this problem down to the .debug_frame

Re: [fpc-devel] FPC 3.0.4 released!

2017-12-18 Thread Markus Beth
Hallo, I want to share my experience with FreePascal 3.0.4 so far. Maybe this can save someone else (that comes across the same problems) some time: fpcbuild-3.0.4.tar.gz: I tried to build RPM packages for OpenSUSE from fpcbuild-3.0.4.tar.gz (downloaded from sourceforge.net). But that failed

Re: [fpc-devel] x86_64.inc CompareByte

2017-10-23 Thread Markus Beth
why this is so. On 23.10.2017 00:25, Markus Beth wrote: I used 2 different benchmarks. One for (very) short buffers [1] and one for rather large buffers [2]. [1]: var   key, key2: string;   res: LongWord;   i: SizeInt; begin   key  := 'A';   key2 := 'A';   for i:= 0 to 10 do begin

Re: [fpc-devel] x86_64.inc CompareByte

2017-10-22 Thread Markus Beth
ge CPU tomorrow. On 22.10.2017 20:55, Florian Klämpfl wrote: Am 21.10.2017 um 01:24 schrieb Markus Beth: Find attached the already announced version of CompareByte. What benchmark did you use? In my tests it is slightly slower than that one of fpc 3.0.x? I used the following test program: var

Re: [fpc-devel] x86_64.inc CompareByte

2017-10-20 Thread Markus Beth
Find attached the already announced version of CompareByte. BTW: If you really like to see a PCMPSTR based implementation, have a look at Agner Fog's Subroutine library asmlib.zip (http://agner.org/optimize/). On 16.10.2017 23:08, Markus Beth wrote: On 16.10.2017 22:41, Florian Klämpfl wrote

Re: [fpc-devel] x86_64.inc CompareByte

2017-10-16 Thread Markus Beth
On 16.10.2017 22:41, Florian Klämpfl wrote: P.S.: I am currently working on another version of CompareByte that might have a slightly higher latency for very small len but a higher throughput (2 cycles per iteration vs. 3 cycles on an Intel Arrandale CPU (Westmere microarchitecture)). But this

Re: [fpc-devel] x86_64.inc CompareByte

2017-10-16 Thread Markus Beth
Sorry for the late reply. I had a weekend off(line). The instructions were chosen on purpose and Sergey already cited the part of the Intel documentation that explains why this is correct. You can find a similar part in AMD "AMD64 Architecture Programmer’s Manual Volume 1: Application

[fpc-devel] comments in ipc.pp

2017-10-05 Thread Markus Beth
Some comments in packages/rtl-extra/src/unix/ipc.pp seem to be wrong/misplaced. I propose the attached patch to fix it. Markus Index: trunk/packages/rtl-extra/src/unix/ipc.pp === --- trunk/packages/rtl-extra/src/unix/ipc.pp

[fpc-devel] x86_64.inc CompareByte

2017-09-30 Thread Markus Beth
It did some changes to CompareByte in rtl/x86_64/x86_64.inc to reduce the code size and make it run faster (see attached path). I was successful with the code size deduction (47 bytes vs. 62 bytes) and also with the speed (according to a micro benchmark [1] run on an Ivy Bridge desktop). To

[fpc-devel] MM_MaskInvalidOp in mxcsr

2010-09-03 Thread Markus Beth
Is there a reason why MM_MaskInvalidOp is not set in mxcsr in rtl/x86_64/x86_64.inc? I have a 3rd party application that depends on InvalidOp to be masked out. I want to extend this application with a FPC library. But the initialization code of the library (SysInitFPU) unmasks the InvalidOp

[fpc-devel] Patch for rtl/inc/rtti.inc

2008-09-14 Thread Markus Beth
Hi, this patch rewrites code of ArrayRTTI and fpc_Copy to allow the compiler to generate faster code (at least on i386). The change in ArrayRTTI yields a performance gain of ~4% for our real world application on a 3 GHz Intel Xeon. The change in fpc_Copy is completely untestet because I don't

[fpc-devel] patch for ReadInteger in sysformt.inc

2008-09-07 Thread Markus Beth
Hi, latest profiling of one of our FPC applications showed ReadInteger (in rtl/objpas/sysutils/sysformt.inc) to be a performance bottleneck. Using Pos(Fmt[chpos],'1234567890')0 to check if Fmt[chpos] is a digit is somewhat time consuming. I replaced it with two char comparisons with '0' and