Oops, small mistake caused by last minute change (I replaced rol with shl): it needs to be shr (or ror or rol, they all perform about the same on my cpu).
And in case anyone wonders, the first cmp and branch returns 0 for numbers that would cause an integer overflow, and the 2nd cmp and branch skips the whole thing if the input is between -1 and 1 (so it already is just a fraction). Also, at least on my CPU (AMD Phenom II X6, so not exactly the newest) the effect of code alignment on performance is huge. (It affects the branch predictor I think.) I’m not sure what the best alignment is for all CPUs. .align 16 forces alignment of the function entry point to a multiple of 16. If I add between 0 and 3 nops at the start of the function, the timings for calling it 10 million times are: In range (1e15+0.5): 0 nop 1266149 1 nop 4260343 2 nop 1369745 3 nop 4469482 Out of range (1e16+0.5): 0 nop 881536 1 nop 896240 2 nop 890805 3 nop 871582 Only fraction (0.5): 0 nop 894850 1 nop 2219469 2 nop 955618 3 nop 2303233 Leaving out the check if it’s already a fraction decreases the time for in-range numbers and increases it for once that are already a faction: In range (1e15+0.5): do skip 1306063 no skip 1121395 Out of range (1e16+0.5): do skip 887081 no skip 888925 Only fraction (0.5): do skip 903330 no skip 1124026 function FracDoSkip(const X: Double): Double; asm .align 16 .noframe movq rdx, xmm0 rol rdx, 32 and edx, $7FF00000 cmp edx, $43300000 jge @@zero cmp edx, $3FE00000 jbe @@skip cvttsd2si rax, xmm0 cvtsi2sd xmm4, rax subsd xmm0, xmm4 jmp @@skip @@zero: xorpd xmm0, xmm0 @@skip: end; function FracNoSkip(const X: Double): Double; asm .align 16 .noframe movq rdx, xmm0 rol rdx, 32 and edx, $7FF00000 cmp edx, $43300000 jge @@zero // cmp edx, $3FE00000 // jbe @@skip cvttsd2si rax, xmm0 cvtsi2sd xmm4, rax subsd xmm0, xmm4 jmp @@skip @@zero: xorpd xmm0, xmm0 @@skip: end; Cheers, Thorsten From: fpc-devel <fpc-devel-boun...@lists.freepascal.org> On Behalf Of Thorsten Engler Sent: Saturday, 28 April 2018 15:37 To: 'FPC developers' list' <fpc-devel@lists.freepascal.org> Subject: Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64 I’ve only tested it in Delphi, so you’ll have to convert it to the right syntax for fpc, but either of these should do: function Frac1(const X: Double): Double; asm .align 16 .noframe movq rdx, xmm0 shl rdx, 32 and edx, $7FF00000 cmp edx, $43300000 jge @@zero cmp edx, $3FE00000 jbe @@skip cvttsd2si rax, xmm0 cvtsi2sd xmm4, rax subsd xmm0, xmm4 jmp @@skip @@zero: xorpd xmm0, xmm0 @@skip: end; function Frac2(const X: Double): Double; asm .align 16 .noframe movq rdx, xmm0 shl rdx, 48 and dx, $7FF0 cmp dx, $4330 jge @@zero cmp dx, $3FE0 jbe @@skip cvttsd2si rax, xmm0 cvtsi2sd xmm4, rax subsd xmm0, xmm4 jmp @@skip @@zero: xorpd xmm0, xmm0 @@skip: end; From: fpc-devel <fpc-devel-boun...@lists.freepascal.org <mailto:fpc-devel-boun...@lists.freepascal.org> > On Behalf Of Sven Barth via fpc-devel Sent: Friday, 27 April 2018 23:47 To: FPC developers' list <fpc-devel@lists.freepascal.org <mailto:fpc-devel@lists.freepascal.org> > Cc: Sven Barth <pascaldra...@googlemail.com <mailto:pascaldra...@googlemail.com> > Subject: *** GMX Spamverdacht *** Re: [fpc-devel] Broken frac function in FPC3.1.1 / Windows x86_64 Bart <bartjun...@gmail.com <mailto:bartjun...@gmail.com> > schrieb am Fr., 27. Apr. 2018, 13:42: On Wed, Apr 25, 2018 at 11:04 AM, <i...@wolfgang-ehrhardt.de <mailto:i...@wolfgang-ehrhardt.de> > wrote: > If you compile and run this 64-bit program on Win 64 you get a crash And AFAICS your analysis of the cause (see bugtracker) is correct as well. function fpc_frac_real(d: ValReal) : ValReal;compilerproc; assembler; nostackframe; asm cvttsd2si %xmm0,%rax { Windows defines %xmm4 and %xmm5 as first non-parameter volatile registers; on SYSV systems all are considered as such, so use %xmm4 } cvtsi2sd %rax,%xmm4 subsd %xmm4,%xmm0 end; CVTTSD2SI — Convert with Truncation Scalar Double-Precision Floating-Point Value to Signed Integer This should not be used to get a ValReal result. The code essentially does the following (instruction by instruction): === code begin === tmpi := int64(d - trunc(d)); tmpd := double(tmpi); Result := d - tmpd; === code end === Though why it fails with the given value is a different topic... Regards, Sven
_______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel