Re: [fpc-devel] Optimization of redundant mov's

Jonas Maebe Mon, 20 Mar 2017 01:47:34 -0700

On 19/03/17 21:28, Martok wrote:

It's called register spilling: once there are no registers left to hold
values, the compiler has to pick registers whose value will be kept in
memory instead.

I thought it would be something like that...

Still, my main issue was with the repeated fetches. I'd (naively!) say that it
should be relatively easy for an assembly-level optimizer to detect that these
are repeated loads of the same thing, with nothing that could affect the outcome
inbetween. It's not even a CSE in the technical sense, not a sub-expression but
the entire thing...

It is trivial to create a peephole optimization for that particularpattern. At least if it's just two loads, because after you've optimizedthe second load into a register move, the third load no longer fits thepattern... Unless you create a special peephole optimizer pass that goesover the code backwards to apply this specific optimization, or youfirst match the pattern as many times as possible before changing it.But then it will still fail if there is at least one other instructionin between.

So then you have to slightly generalise it, and in the end you do end upwith a full-blown assembler CSE optimizer, like the one we removed for3.0. I'm a staunch believer in not wasting time on stuff like that, it'sjust not worth it. Especially since a better register allocator, or SSA,can probably achieve the same thing in this case.

E.g. those memory loads
are probably optimised by the processor itself (not necessarily coming
even from the L1 cache, but possibly from the write-back buffer).

Not as well as one might believe, manually fixing (by forcing @CurrentHash into
a register with a local variable) just those 4 lines gives a ~2% increase in
MB/s for this hash. Which is quite a lot, given this is the part *without*
actual computations.

You cannot attribute those 2% exclusively to keeping the values inregisters. E.g. removing them can change branch target alignments. Evenadding random nops can get you 10% due to changed code layout.

And again, I've seen this happen more than once on i386 code, where it even
creates "fake" register pressure (by using 2 or more registers to hold exactly
the same temporary)

That's again something that needs to be solved at the register allocatorlevel (with SSA). Freeing up registers anymore afterwards is useless,since only the register allocator can keep stuff in them permanently.



Jonas
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] Optimization of redundant mov's

Reply via email to