Hi Nikolay,
I've simplified my test as much as I can, and hopefully I have something
that properly tests whether TEST has a false dependency or not. I'm
willing to admit that I may have been mistaken and the slowdown was
caused by something else.
The test functions effectively do a
On 10/4/20 2:01 PM, J. Gareth Moreton via fpc-devel wrote:
Hi Nikolay,
I've got some good code to test, but I need to double-check with
someone to see if the licensing agreements allow (the code is rather
complex, but showcases the effect of the TEST instructions quite nicely).
Is your
Hi Nikolay,
I've got some good code to test, but I need to double-check with someone
to see if the licensing agreements allow (the code is rather complex,
but showcases the effect of the TEST instructions quite nicely).
Is your platform a Windows or a Unix machine? I ask because I don't
Sure, I can send you something. It might have to be to a personal
e-mail though depending on how big the attachments are. Watch this space.
I may be a bit of a mad scientist when it comes to my testing and
research (and sometimes I make a stupid mistake like with the recent
nested function
On 10/2/20 2:13 PM, J. Gareth Moreton via fpc-devel wrote:
Confirmed my suspicions. if I zero the upper bits of the register (I
used something akin to "AND RCX, $F"), there is no speed loss.
Therefore, I can make the hypothesis, on my Intel(R) Core(TM)
i7-10750H, that using TEST on a
Confirmed my suspicions. if I zero the upper bits of the register (I
used something akin to "AND RCX, $F"), there is no speed loss.
Therefore, I can make the hypothesis, on my Intel(R) Core(TM) i7-10750H,
that using TEST on a sub-register causes a false dependency if the bits
outside of the
So... I've done some tests, replacing TEST RCX, $4 with TEST CL, $4 and
the like in a number-crunching function, and it seems to cause a notable
penalty, even though none of the instructions are in my critical loop.
So I think it's something that needs to be avoided in most cases. I
think
Ah brilliant, thank you.
I have used Agner Fog's material before for cycle counting. When I
implemented my 3 MOV -> XCHG optimisation
(https://bugs.freepascal.org/view.php?id=36511), I used Agner Fog's
empirical results to determine when it's best to apply this optimisation
where speed is
On 10/1/20 11:36 PM, J. Gareth Moreton via fpc-devel wrote:
I thought that might be the case - thanks Nikolay. And I meant to say
lower bits of a REGISTER, not an instruction!
Admittedly I'm cycle-counting and byte-counting again! I was looking
for ways to reduce 13 bytes of padding in one
I thought that might be the case - thanks Nikolay. And I meant to say
lower bits of a REGISTER, not an instruction!
Admittedly I'm cycle-counting and byte-counting again! I was looking
for ways to reduce 13 bytes of padding in one of my pure assembly
language routines and realised I could
On 10/1/20 8:17 PM, J. Gareth Moreton via fpc-devel wrote:
Hi everyone,
I have a small question with assembler size optimisation that maybe
one of you guys can give me a second opinion on:
If you are using the "test" instruction to test some of the lower bits
of an instruction, e.g. TEST
Hi everyone,
I have a small question with assembler size optimisation that maybe one
of you guys can give me a second opinion on:
If you are using the "test" instruction to test some of the lower bits
of an instruction, e.g. TEST RCX, $2, is there a penalty with calling
TEST CL, $2 instead?
12 matches
Mail list logo