I should have figured. Thank you!
Kit
On 27/10/2023 01:51, Nikolay Nikolov via fpc-devel wrote:
On 10/11/23 11:21, Tomas Hajny via fpc-devel wrote:
On 2023-10-11 04:15, J. Gareth Moreton via fpc-devel wrote:
Sweet, thank you. Would you be willing to share your modified test's
source? I
On 10/11/23 11:21, Tomas Hajny via fpc-devel wrote:
On 2023-10-11 04:15, J. Gareth Moreton via fpc-devel wrote:
Sweet, thank you. Would you be willing to share your modified test's
source? I was worried that if CPUID wasn't present it would cause a
SIGILL.
Sure, attached, but I didn't do
It was a thought that crossed my mind when Stefan pointed out the
translated Google Benchmark, but given that it hasn't yet been adapted
to work outside of i386 and x86_64, you are right that it probably
shouldn't be used for the time being. The framework uses CPU timings to
decide how many
On 2023-10-13 17:08, J. Gareth Moreton via fpc-devel wrote:
Interesting! That's a bug report to send to the maintainers of the
framework. I'll need to have them fix it before I'd be willing to try
again with its use in FPC.
Removed the reference. Apologies - I'm rushing a bit.
BTW, it's
This one's for you Stefan!
https://github.com/spring4d/benchmark/issues/4
Kit
On 13/10/2023 16:03, Tomas Hajny via fpc-devel wrote:
On 2023-10-13 16:25, J. Gareth Moreton via fpc-devel wrote:
GetLogicalProcessorInformation returns a Boolean - if false, an error
occurred, and is handled as
Interesting! That's a bug report to send to the maintainers of the
framework. I'll need to have them fix it before I'd be willing to try
again with its use in FPC.
Removed the reference. Apologies - I'm rushing a bit.
Kit
On 13/10/2023 16:03, Tomas Hajny via fpc-devel wrote:
On
On 2023-10-13 16:25, J. Gareth Moreton via fpc-devel wrote:
GetLogicalProcessorInformation returns a Boolean - if false, an error
occurred, and is handled as follows:
DiagnoseAndExit('Failed during call to GetLogicalProcessorInformation:
' + GetLastError.ToString);
GetLastError = 8 indicates
GetLogicalProcessorInformation returns a Boolean - if false, an error
occurred, and is handled as follows:
DiagnoseAndExit('Failed during call to GetLogicalProcessorInformation: '
+ GetLastError.ToString);
GetLastError = 8 indicates "out of memory", which I will say is odd.
Nevertheless,
Oops - that was a silly mistake of mine with R8. As for the other
error, that sounds like it's in the third party benchmark suite. I'll
do some investigating on my virtual machine.
In the meantime, here's the fixed test with the stray R8 call properly
filtered out on i386 (it's replaced
On 2023-10-13 09:26, Tomas Hajny wrote:
On 2023-10-12 20:02, J. Gareth Moreton via fpc-devel wrote:
So an update.
.
.
The latest version of blea.pp doesn't compile with a 32-bit compiler -
line 76 contains an unconditional reference to R8 register, which
obviously doesn't for the 32-bit
On 2023-10-12 20:02, J. Gareth Moreton via fpc-devel wrote:
So an update.
.
.
The latest version of blea.pp doesn't compile with a 32-bit compiler -
line 76 contains an unconditional reference to R8 register, which
obviously doesn't for the 32-bit mode.
Tomas
So an update.
I've added Spring.Benchmark to "tests/bench/spring" on my local branch,
along with its readme and licence file. It seems to work quite well
even if it feels a bit like overkill for this small a benchmark. Still,
I've attached the version with Stefan's translated Google
On 2023-10-11 04:15, J. Gareth Moreton via fpc-devel wrote:
Sweet, thank you. Would you be willing to share your modified test's
source? I was worried that if CPUID wasn't present it would cause a
SIGILL.
Sure, attached, but I didn't do anything special - I modified it in a
way allowing easy
The LEA and ADD times are close enough that I can consider them
identical. And Braswell (the architecture behind that brand of Celeron)
doesn't support AVX, I don't think, so that lines up with COREI having a
fast LEA instruction but not COREAVX.
Given the many different x86-compatible CPUs,
On Tue, Oct 10, 2023 at 11:13 AM J. Gareth Moreton via fpc-devel
wrote:
>
> Thanks Tomas,
>
> Nothing is broken, but the timing measurement isn't precise enough.
>
> Normally I have a much higher iteration count (e.g. 1,000,000), but I
> had reduced it to 10,000 because, coupled with the 1,000
Sweet, thank you. Would you be willing to share your modified test's
source? I was worried that if CPUID wasn't present it would cause a SIGILL.
Kit
On 11/10/2023 01:47, Tomas Hajny via fpc-devel wrote:
On 2023-10-10 13:24, J. Gareth Moreton via fpc-devel wrote:
I'm all for receiving
On 2023-10-10 13:24, J. Gareth Moreton via fpc-devel wrote:
I'm all for receiving results for all kinds of processor, as it helps
me to make more informed choices on flags as well as confirming that
Agner Fog''s instruction tables are correct. Also, results for older
processors can be hard to
I'm all for receiving results for all kinds of processor, as it helps me
to make more informed choices on flags as well as confirming that Agner
Fog''s instruction tables are correct. Also, results for older
processors can be hard to come by sometimes.
Currently, most architectures have a
On 2023-10-10 12:19, Marco van de Voort via fpc-devel wrote:
Op 10-10-2023 om 11:13 schreef J. Gareth Moreton via fpc-devel:
Thanks Tomas,
Nothing is broken, but the timing measurement isn't precise enough.
Normally I have a much higher iteration count (e.g. 1,000,000), but I
had reduced it
Op 10-10-2023 om 11:13 schreef J. Gareth Moreton via fpc-devel:
Thanks Tomas,
Nothing is broken, but the timing measurement isn't precise enough.
Normally I have a much higher iteration count (e.g. 1,000,000), but I
had reduced it to 10,000 because, coupled with the 1,000 iterations in
the
Ooo, that might be just what we need. Thank you Stefan.
Kit
On 10/10/2023 10:57, Stefan Glienke via fpc-devel wrote:
Be my guest making https://github.com/spring4d/benchmark compatible for all
platforms you need it for.
On 10/10/2023 11:13 CEST J. Gareth Moreton via fpc-devel
wrote:
Be my guest making https://github.com/spring4d/benchmark compatible for all
platforms you need it for.
> On 10/10/2023 11:13 CEST J. Gareth Moreton via fpc-devel
> wrote:
>
>
> Thanks Tomas,
>
> Nothing is broken, but the timing measurement isn't precise enough.
>
> Normally I have a much
Looking at the text log, the results are a bit strange and I can't
easily explain it. Normally a system interrupt would increase the time
taken.
Let me know if increasing the iteration count fixes it or not.
Kit
On 10/10/2023 09:57, Tomas Hajny wrote:
On 2023-10-09 20:51, J. Gareth Moreton
Thanks Tomas,
Nothing is broken, but the timing measurement isn't precise enough.
Normally I have a much higher iteration count (e.g. 1,000,000), but I
had reduced it to 10,000 because, coupled with the 1,000 iterations in
the subroutines themselves, would have led to 1,000,000,000 passes and
On 2023-10-09 20:51, J. Gareth Moreton via fpc-devel wrote:
Hi Kit,
I updated the "blea" test in the merge request so it now displays the
processor brand name on x86_64; however, it is not fetched under i386
because CPUID was not introduced until later 486 processors. I've
attached it to
My results on Windows :
E:\temp>C:\lazarus\fpc\3.2.2\bin\x86_64-win64\fpc.exe -MObjFPC -Scghi
-O1 -g -gl -l -vewnhibq -Fu. -FUlib\x86_64-win64 -FE. -oblea.exe blea.pp
Hint: (11030) Start of reading config file
C:\lazarus\fpc\3.2.2\bin\x86_64-win64\fpc.cfg
Hint: (11031) End of reading config
I updated the "blea" test in the merge request so it now displays the
processor brand name on x86_64; however, it is not fetched under i386
because CPUID was not introduced until later 486 processors. I've
attached it to this e-mail if anyone wants to take a look to ensure I
haven't broken
Thank you very much! That processor is built on the Excavator
architecture and lines up with the flag I put in the merge request (i.e.
it has the "fast LEA" hint).
I honestly didn't expect this much testing feedback, so thank you all!
Gareth aka. Kit
P.S. I'm tempted to extend the test
My results:
jean@First-Boss:~/temp$ cat /proc/cpuinfo | grep "model name"
model name : AMD A6-7480 Radeon R5, 8 Compute Cores 2C+6G
jean@First-Boss:~/temp$ /usr/bin/fpc blea.pp
Free Pascal Compiler version 3.2.2 [2021/07/09] for x86_64
Copyright (c) 1993-2021 by Florian Klaempfl and others
Thank you for the report.
According to Agner Fog's table, complex LEA instructions should have a
3-cycle latency on that architecture (Haswell). Optimisations with this
instruction are proving interesting because there's such a variety
between processor architectures. There are some that are
Hi Gareth
model name : Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
Regards
Nataraj S Narayan
Synergy Info Systems
Software & Technology Consultants
Ettumanoor, INDIA
Ph:+91 9443211326
On Sun, Oct 8, 2023 at 6:40 PM J. Gareth Moreton via fpc-devel <
fpc-devel@lists.freepascal.org> wrote:
> Hi
On 2023-10-08 13:45, J. Gareth Moreton via fpc-devel wrote:
Sorry, ignore last attachment - I forgot to change a line of assembly
(it was correct for x86_64-win64!!). Here is the corrected version.
Alright, results for this version for AMD A9 9425 under Linux (the same
trunk compiler as
Did some checking of the test I copied the code from, and I forgot that
Rika's original code only exited once a certain time period had elapsed
(e.g. 0.5 seconds). I had changed it to a standard iteration count
since I was concerned about fairness and accuracy, but I only changed
the loop
In the meantime, here's the merge request for the feature based on user
tests and studying of Agner Fog's instruction tables:
https://gitlab.com/freepascal.org/fpc/source/-/merge_requests/502
Kit
___
fpc-devel maillist -
Hi Nataraj
Which processor is that run on? (although too close to call, it implies
LEA has a latency of 2 in that case)
Kit
On 08/10/2023 14:06, Nataraj S Narayan via fpc-devel wrote:
Hi
[nataraj@dflyHP ~]$ fpc ttt.pas
Free Pascal Compiler version 3.2.2 [2023/07/04] for x86_64
Copyright
Hi
[nataraj@dflyHP ~]$ fpc ttt.pas
Free Pascal Compiler version 3.2.2 [2023/07/04] for x86_64
Copyright (c) 1993-2021 by Florian Klaempfl and others
Target OS: DragonFly for x86-64
Compiling ttt.pas
Linking ttt
/usr/local/bin/ld.bfd: warning:
Sorry, ignore last attachment - I forgot to change a line of assembly
(it was correct for x86_64-win64!!). Here is the corrected version.
Kit
On 08/10/2023 12:38, J. Gareth Moreton via fpc-devel wrote:
Sorry, I got careless and was in a rush, as both the Pascal code is
wrong and I didn't
Sorry, I got careless and was in a rush, as both the Pascal code is
wrong and I didn't store the result of the benchmark test, hence the
error check at the end returned a false negative.
The benchmark code was from Rika's SHA-1 test code, which I didn't
properly check, although I assumed the
1. why you leave "time:=..." in benchmark loop? It does add 50% of execution time per call.
2. Pascal version does not match assembler version. Had to fix it.
//Result := X + Counter + $87654321;
Result:=Result + X + $87654321;
Result:=Result xor y;
3. Assembler functions can be
I'm still slightly curious, but if full optimisations make better code,
then indeed it's probably not worth the effort.
Your timings are incredibly helpful - thank you! If I understand, AMD
A9 is the Excavator architecture, which implies that AMD processors
don't suffer from the same latency
On 2023-10-07 18:09, J. Gareth Moreton via fpc-devel wrote:
That's interesting; I am interested to see the assembly output for the
Pascal control cases. As for the 64-bit version, that was my fault
since the assembly language is for Microsoft's ABI rather than the
System V ABI, so it was
That's interesting; I am interested to see the assembly output for the
Pascal control cases. As for the 64-bit version, that was my fault
since the assembly language is for Microsoft's ABI rather than the
System V ABI, so it was checking a register with an undefined value.
Find attached the
On 2023-10-07 03:57, J. Gareth Moreton via fpc-devel wrote:
Hi Kit,
Do you think this should suffice? Originally it ran for 1,000,000
repetitions but I fear that will take way too long on a 486, so I
reduced it to 10,000.
OK, I tried it now. First of all, after turning on the old machine, I
Hi Tomas,
Do you think this should suffice? Originally it ran for 1,000,000
repetitions but I fear that will take way too long on a 486, so I
reduced it to 10,000.
Kit
On 03/10/2023 06:30, Tomas Hajny via fpc-devel wrote:
On October 3, 2023 03:32:34 +0200, "J. Gareth Moreton via fpc-devel"
What should I call a new sub-CPU option? Should it be "ICELAKE" or is
there a better name like "CORE10" or "COREX" (X being the Roman numeral
for 10, standing in for the 10th generation of Intel Core)?
Kit
On 03/10/2023 08:02, Florian Klämpfl via fpc-devel wrote:
Am 03.10.2023 um 03:32
I don't think any of them currently fit, although Zen 3 is later than
Ice Lake, but I'm not sure if it has a faster LEA or not. I'll do some
investigation. I'll take up Tomas' offer on the 486 test though.
Personally I think the best test might actually be one of the
recently-optimised
> Am 03.10.2023 um 03:32 schrieb J. Gareth Moreton via fpc-devel
> :
>
> Hi everyone,
>
> This is mainly to Florian, but also to anyone else who can answer the
> question - at which point did a complex LEA instruction (using all three
> input operands and some other specific circumstances)
Hmmm, could be fun to attempt to test - I'll see what I can set up.
Kit
On 03/10/2023 06:30, Tomas Hajny via fpc-devel wrote:
On October 3, 2023 03:32:34 +0200, "J. Gareth Moreton via fpc-devel"
wrote:
Hii Kit,
This is mainly to Florian, but also to anyone else who can answer the
On October 3, 2023 03:32:34 +0200, "J. Gareth Moreton via fpc-devel"
wrote:
Hii Kit,
>This is mainly to Florian, but also to anyone else who can answer the question
>- at which point did a complex LEA instruction (using all three input operands
>and some other specific circumstances) get
(And I meant "Ice Lake", not "Icy Lake")
On 03/10/2023 02:32, J. Gareth Moreton via fpc-devel wrote:
Hi everyone,
This is mainly to Florian, but also to anyone else who can answer the
question - at which point did a complex LEA instruction (using all
three input operands and some other
Hi everyone,
This is mainly to Florian, but also to anyone else who can answer the
question - at which point did a complex LEA instruction (using all three
input operands and some other specific circumstances) get slow?
Preliminary research suggests the 486 was when it gained extra latency,
51 matches
Mail list logo