Re: [fpc-devel] Double-checking an optimisation

Martin Frb via fpc-devel Sat, 08 Jan 2022 17:47:44 -0800

On 09/01/2022 01:37, J. Gareth Moreton via fpc-devel wrote:

Hi everyone,
So a merge request of mine was just approved that allows the peepholeoptimizer access to more registers when it needs one for temporarystorage. It allows it to make an optimisation on x86_64-win64 thatwasn't possible before due to the lack of available volatileregisters. In packages\numlib\src\dsl.pas - before:
.Lj184:
    ...
    cmpl    $1,%ecx
    jng    .Lj188
    subl    $1,%ecx
.Lj188:
    ...

After:

.Lj184:
    ...
    cmpl    $1,%ecx
    setg    %bl
    movzbl    %bl,%ebx
    subl    %ebx,%ecx
    ...
%ebx is a non-volatile register, but the current subroutine preservesit and it's not currently in use, so the peephole optimizer can borrowit for a few instructions.
I need to double-check though... is this actually a good optimisationfor speed? It removes a jump and a label, which might permit otherlong-range optimisations, but it's 3 instructions that are in adependency chain.

I take it, it also is one (or two?) bytes longer? If that is in a loop,which otherwise is exactly within a 32 byte aligned block, then thatcould cause a slow down too. (If the loop is 16 bytes long, but alignedto a 32byte-bound+16, then it may slow down if the loop code size goesfrom 16 to 17 bytes, because that is when it goes over the boundary ofthe 32byte block.This is a bit hard to predict. But within very small loops (even 2 ormaybe 3 blocks of 32 bytes), size may be as important. (Actually a goodquestion, what weighs more....)

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] Double-checking an optimisation

Reply via email to