> It's called register spilling: once there are no registers left to hold
> values, the compiler has to pick registers whose value will be kept in
> memory instead.
I thought it would be something like that...
Still, my main issue was with the repeated fetches. I'd (naively!) say that it
should be relatively easy for an assembly-level optimizer to detect that these
are repeated loads of the same thing, with nothing that could affect the outcome
inbetween. It's not even a CSE in the technical sense, not a sub-expression but
the entire thing...
> E.g. those memory loads
> are probably optimised by the processor itself (not necessarily coming
> even from the L1 cache, but possibly from the write-back buffer).
Not as well as one might believe, manually fixing (by forcing @CurrentHash into
a register with a local variable) just those 4 lines gives a ~2% increase in
MB/s for this hash. Which is quite a lot, given this is the part *without*
And again, I've seen this happen more than once on i386 code, where it even
creates "fake" register pressure (by using 2 or more registers to hold exactly
the same temporary) that makes the rest of the code worse than it could be.
As a ballpark: the same change as above results in a 10% speedup by freeing up 2
registers (all-int64 operations on i386, so 2 regs needed for everything, having
one more is very noticeable...)
It just strikes me as odd to have some rather good local code but then just
pointlessly add the second-most expensive operation in between ;-)
fpc-devel maillist - email@example.com