Re: [fpc-devel] FillWord, FillDWord and FillQWord are very poorly optimised on Win64 (not sure about x86-64 on Linux)

2017-11-01 Thread Sergei Gorelkin via fpc-devel
01.11.2017 10:46, Sven Barth via fpc-devel wrote: Am 01.11.2017 05:58 schrieb "J. Gareth Moreton" >: Would it be worth opening up a bug report for this, with the attached assembler routines as suggestions? I haven't

Re: [fpc-devel] FillWord, FillDWord and FillQWord are very poorly optimised on Win64 (not sure about x86-64 on Linux)

2017-11-01 Thread Martok
Am 01.11.2017 um 05:58 schrieb J. Gareth Moreton: > So I've been doing some playing around recently, and noticed that while > FillChar has some very fast internal > code for initialising a block of memory, making use of non-temporal hints and > memory fences, the versions > for the larger

Re: [fpc-devel] FillWord, FillDWord and FillQWord are very poorly optimised on Win64 (not sure about x86-64 on Linux)

2017-11-01 Thread Florian Klämpfl
Am 01.11.2017 um 05:58 schrieb J. Gareth Moreton: > I also made versions that use memory fences and other checks such as memory > alignment in order to gain speed > - I've converted them to use the System V ABI of Linux as well, but are > currently completely untested as I > don't have the

Re: [fpc-devel] FillWord, FillDWord and FillQWord are very poorly optimised on Win64 (not sure about x86-64 on Linux)

2017-11-01 Thread Sven Barth via fpc-devel
Am 01.11.2017 05:58 schrieb "J. Gareth Moreton" : Would it be worth opening up a bug report for this, with the attached assembler routines as suggestions? I haven't worked out how to implement internal functions into the compiler yet, and I rather clear it with you guys

[fpc-devel] FillWord, FillDWord and FillQWord are very poorly optimised on Win64 (not sure about x86-64 on Linux)

2017-10-31 Thread J. Gareth Moreton
So I've been doing some playing around recently, and noticed that while FillChar has some very fast internal code for initialising a block of memory, making use of non-temporal hints and memory fences, the versions for the larger types fall back to slow Pascal code. To showcase this, I ran a