On 19/03/17 21:28, Martok wrote:

It's called register spilling: once there are no registers left to hold
values, the compiler has to pick registers whose value will be kept in
memory instead.
I thought it would be something like that...

Still, my main issue was with the repeated fetches. I'd (naively!) say that it
should be relatively easy for an assembly-level optimizer to detect that these
are repeated loads of the same thing, with nothing that could affect the outcome
inbetween. It's not even a CSE in the technical sense, not a sub-expression but
the entire thing...

It is trivial to create a peephole optimization for that particular pattern. At least if it's just two loads, because after you've optimized the second load into a register move, the third load no longer fits the pattern... Unless you create a special peephole optimizer pass that goes over the code backwards to apply this specific optimization, or you first match the pattern as many times as possible before changing it. But then it will still fail if there is at least one other instruction in between.

So then you have to slightly generalise it, and in the end you do end up with a full-blown assembler CSE optimizer, like the one we removed for 3.0. I'm a staunch believer in not wasting time on stuff like that, it's just not worth it. Especially since a better register allocator, or SSA, can probably achieve the same thing in this case.

E.g. those memory loads
are probably optimised by the processor itself (not necessarily coming
even from the L1 cache, but possibly from the write-back buffer).
Not as well as one might believe, manually fixing (by forcing @CurrentHash into
a register with a local variable) just those 4 lines gives a ~2% increase in
MB/s for this hash. Which is quite a lot, given this is the part *without*
actual computations.

You cannot attribute those 2% exclusively to keeping the values in registers. E.g. removing them can change branch target alignments. Even adding random nops can get you 10% due to changed code layout.

And again, I've seen this happen more than once on i386 code, where it even
creates "fake" register pressure (by using 2 or more registers to hold exactly
the same temporary)

That's again something that needs to be solved at the register allocator level (with SSA). Freeing up registers anymore afterwards is useless, since only the register allocator can keep stuff in them permanently.

fpc-devel maillist  -  fpc-devel@lists.freepascal.org

Reply via email to