Re: [fpc-devel] x86_64 question

2020-10-15 Thread J. Gareth Moreton via fpc-devel
Hi Nikolay, I've simplified my test as much as I can, and hopefully I have something that properly tests whether TEST has a false dependency or not.  I'm willing to admit that I may have been mistaken and the slowdown was caused by something else. The test functions effectively do a

Re: [fpc-devel] x86_64 question

2020-10-05 Thread Nikolay Nikolov via fpc-devel
On 10/4/20 2:01 PM, J. Gareth Moreton via fpc-devel wrote: Hi Nikolay, I've got some good code to test, but I need to double-check with someone to see if the licensing agreements allow (the code is rather complex, but showcases the effect of the TEST instructions quite nicely). Is your

Re: [fpc-devel] x86_64 question

2020-10-04 Thread J. Gareth Moreton via fpc-devel
Hi Nikolay, I've got some good code to test, but I need to double-check with someone to see if the licensing agreements allow (the code is rather complex, but showcases the effect of the TEST instructions quite nicely). Is your platform a Windows or a Unix machine?  I ask because I don't

Re: [fpc-devel] x86_64 question

2020-10-02 Thread J. Gareth Moreton via fpc-devel
Sure, I can send you something.  It might have to be to a personal e-mail though depending on how big the attachments are. Watch this space. I may be a bit of a mad scientist when it comes to my testing and research (and sometimes I make a stupid mistake like with the recent nested function

Re: [fpc-devel] x86_64 question

2020-10-02 Thread Nikolay Nikolov via fpc-devel
On 10/2/20 2:13 PM, J. Gareth Moreton via fpc-devel wrote: Confirmed my suspicions.  if I zero the upper bits of the register (I used something akin to "AND RCX, $F"), there is no speed loss. Therefore, I can make the hypothesis, on my Intel(R) Core(TM) i7-10750H, that using TEST on a

Re: [fpc-devel] x86_64 question

2020-10-02 Thread J. Gareth Moreton via fpc-devel
Confirmed my suspicions.  if I zero the upper bits of the register (I used something akin to "AND RCX, $F"), there is no speed loss. Therefore, I can make the hypothesis, on my Intel(R) Core(TM) i7-10750H, that using TEST on a sub-register causes a false dependency if the bits outside of the

Re: [fpc-devel] x86_64 question

2020-10-02 Thread J. Gareth Moreton via fpc-devel
So... I've done some tests, replacing TEST RCX, $4 with TEST CL, $4 and the like in a number-crunching function, and it seems to cause a notable penalty, even though none of the instructions are in my critical loop.  So I think it's something that needs to be avoided in most cases.  I think

Re: [fpc-devel] x86_64 question

2020-10-02 Thread J. Gareth Moreton via fpc-devel
Ah brilliant, thank you. I have used Agner Fog's material before for cycle counting.  When I implemented my 3 MOV -> XCHG optimisation (https://bugs.freepascal.org/view.php?id=36511), I used Agner Fog's empirical results to determine when it's best to apply this optimisation where speed is

Re: [fpc-devel] x86_64 question

2020-10-01 Thread Nikolay Nikolov via fpc-devel
On 10/1/20 11:36 PM, J. Gareth Moreton via fpc-devel wrote: I thought that might be the case - thanks Nikolay.  And I meant to say lower bits of a REGISTER, not an instruction! Admittedly I'm cycle-counting and byte-counting again!  I was looking for ways to reduce 13 bytes of padding in one

Re: [fpc-devel] x86_64 question

2020-10-01 Thread J. Gareth Moreton via fpc-devel
I thought that might be the case - thanks Nikolay.  And I meant to say lower bits of a REGISTER, not an instruction! Admittedly I'm cycle-counting and byte-counting again!  I was looking for ways to reduce 13 bytes of padding in one of my pure assembly language routines and realised I could

Re: [fpc-devel] x86_64 question

2020-10-01 Thread Nikolay Nikolov via fpc-devel
On 10/1/20 8:17 PM, J. Gareth Moreton via fpc-devel wrote: Hi everyone, I have a small question with assembler size optimisation that maybe one of you guys can give me a second opinion on: If you are using the "test" instruction to test some of the lower bits of an instruction, e.g. TEST

[fpc-devel] x86_64 question

2020-10-01 Thread J. Gareth Moreton via fpc-devel
Hi everyone, I have a small question with assembler size optimisation that maybe one of you guys can give me a second opinion on: If you are using the "test" instruction to test some of the lower bits of an instruction, e.g. TEST RCX, $2, is there a penalty with calling TEST CL, $2 instead?