Re: [fpc-devel] x86_64 question

J. Gareth Moreton via fpc-devel Fri, 02 Oct 2020 03:58:48 -0700

So... I've done some tests, replacing TEST RCX, $4 with TEST CL, $4 andthe like in a number-crunching function, and it seems to cause a notablepenalty, even though none of the instructions are in my critical loop. So I think it's something that needs to be avoided in most cases. Ithink the reason why it worked in my Int and Frac functions is becausethe processor knows the upper 48 bits of the register are zero.

Long story short... best not to do it unless you have some additionalinsight into what the registers contain.


Gareth aka. Kit


On 02/10/2020 08:15, J. Gareth Moreton via fpc-devel wrote:

Ah brilliant, thank you.
I have used Agner Fog's material before for cycle counting. When Iimplemented my 3 MOV -> XCHG optimisation(https://bugs.freepascal.org/view.php?id=36511), I used Agner Fog'sempirical results to determine when it's best to apply thisoptimisation where speed is concerned (on a lot of older processors,it's not worth it because XCHG took 3 cycles and the 3 MOVs generallytook only 2 (due to how the dependency chain is set up). Only whenXCHG's cycle count dropped to 1 or 2, or when optimising for size,does it pay off.
So it looks like a partial read of the lower bits is absolutely fine,since you're not changing anything.
Gareth aka. Kit

On 02/10/2020 01:40, Nikolay Nikolov via fpc-devel wrote:
On 10/1/20 11:36 PM, J. Gareth Moreton via fpc-devel wrote:
I thought that might be the case - thanks Nikolay. And I meant tosay lower bits of a REGISTER, not an instruction!
Admittedly I'm cycle-counting and byte-counting again! I waslooking for ways to reduce 13 bytes of padding in one of my pureassembly language routines and realised I could make a savingthere. The only thing I can think of that I have to watch out forlogically is if I change, say, TEST EAX, $80 to TEST AL, $80, thelatter will set the sign flag if the most-significant bit is 1 afterthe 'and' operation) while the former always clears the sign flag.
I have used such subregisters before in the FPC RTL, in fpc_int_realand fpc_frac_real in rtl/x86_64/math.inc, where I read AX instead ofthe larger RAX, but that's only after a call to "SHR RAX, 48" thatguarantees that everything above the 16th bit is zero, and aftertesting other implementation candidates a kind of informalcompetition. (Surprisingly, I think "shr $48, %rax; and $0x7ff0,%ax;cmp $0x4330,%ax" runs faster than moving 64-bit constants intotemporary registers (since 64-bit immediates aren't supportedoutside of MOV) and using 'and' and 'cmp' on %rax directly)
I think you always get a read penalty when using the high-byteregisters because the processor has to do an implicit shift operation.
I don't remember the reason, but I recall reading they are lessefficient in Agner Fog's optimization manual. Here's the relevant quote:
"Any use of the high 8-bit registers AH, BH, CH, DH should be avoidedbecause it can cause false dependences and less efficient code."
It's from the chapter "Partial registers" (page 61) of this document:

https://www.agner.org/optimize/optimizing_assembly.pdf
Highly recommended reading, as it addresses exactly the topic ofpartial registers. In general, it is the partial register writes of16-bit or 8-bit subregisters that cause problems - either false readdependencies (usually on AMD) or extra penalties forjoining/splitting registers (on Intel, at least in the P6 era).
Best regards,

Nikolay

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] x86_64 question

Reply via email to