On Friday, 31 May 2013 at 11:49:05 UTC, Manu wrote:
I find that using templates actually makes it more likely for
the compiler
to properly inline. But I think the totally generic expressions
produce
cases where the compiler is considering too many possibilities
that inhibit
many optimisations.
It might also be that the optimisations get a lot more complex
when the
code fragments span across a complex call tree with optimisation
dependencies on non-deterministic inlining.
One of the most important jobs for the optimiser is code
re-ordering.
Generic code is often written in such a way that makes it
hard/impossible
for the optimiser to reorder the flattened code properly.
Hand written code can have branches and memory accesses
carefully placed at
the appropriate locations.
Generic code will usually package those sorts of operations
behind little
templates that often flatten out in a different order.
The optimiser is rarely able to re-order code across if
statements, or
pointer accesses. __restrict is very important in generic code
to allow the
optimiser to reorder across any indirection, otherwise
compilers typically
have to be conservative and presume that something somewhere
may have
changed the destination of a pointer, and leave the order as
the template
expanded. Sadly, D doesn't even support __restrict, and nobody
ever uses it
in C++ anyway.
I've always has better results with writing precisely what I
intend the
compiler to do, and using __forceinline where it needs a little
extra
encouragement.
Thanks for valuable input. Have never had a pleasure to actually
try templates in performance-critical code and this a good stuff
to remember about. Have added to notes.