On Friday, 14 March 2014 at 11:04:34 UTC, Manu wrote:
On 14 March 2014 18:03, John Colvin
<[email protected]> wrote:
As much as I like the idea:
Something always tells me this is the compilers job... What
clever
reasoning are you applying that the compiler's inliner can't?
It seems like
a different situation to say SIMD code, where correctly
structuring loops
can require a lot of gymnastics that the compiler can't or
won't (floating
point conformance) do. The inlining decision seems easily
automatable in
comparison.
I understand that unoptimised builds for debugging are a
problem, but a
sensible compiler let's you hand pick your optimisation passes.
In short: why are compilers not good enough at this that the
programmer
needs to be involved?
The compiler applies generalised heuristics, which are
certainly for the
'common' case, whatever that happens to be.
The compiler simply doesn't know what you're doing, so it's
very hard for
the compiler to do anything really intelligent.
Inlining heuristics are fickle, and they also don't know what
you're
actually trying to do.
Is a function 'long'? How long is 'long'? Is the function
'hot'? Do we
prefer code size or execution speed? Is the function called
only from this
location, or is it used in many locations? Etc.
Inlining is one of the most fuzzy pieces of logic in the
compiler, and
relies on a lot of information that is impossible for the
compiler to
deduce, so it applies heuristics to try and do a decent job,
but it's
certainly not perfect.
I argue, nothing so fickle can exist in the language without
having a
manual override. Especially not in a native language.
In my current case, the functions I need to inline are not
exactly trivial.
They're really pushing the boundaries of the compilers inliner
heuristics,
and then I'm calling a series of such functions that operate on
parallel
data.
If they don't inline, the performance equals the sum of the
functions plus
some overhead. If they all inline, the performance is equal to
only the
longest one, and no overhead (the others fill in pipeline gaps).
Further, some of these functions embed some shared work... if
they don't
inline, this work is repeated. If they do inline, the redundant
repeated
work is eliminated.
My experiments with std.algorithm were a failure. I realised
quickly that I
couldn't rely on the inliner to do a satisfactory job, and the
optimiser
was unable to do it's job properly.
std.algorithm could really benefit from the mixin suggestion
since things
like predicate functions are always trivial, usually supplied
as little
lambdas, and inlining isn't reliable. Especially in the debug
builds.
Something like algorithm loop sugar shouldn't run heaps worse
than an
explicit loop just because it happens to be implemented by a
generic
function.
Thanks for the explanations.
Another use case is to aid propogation of compile-time
information for optimisation.
A function might look like a poor candidate for inlining for
other reasons, but if there's a statically known (to the caller)
integer parameter coming in that will be used to decide a loop
length, inlining allows that info to be propogated to the callee.
Static loop lengths => well optimised loops, with opportunities
for optimal unrolling. Even with quite a large function this can
be a good choice to inline.
I don't know how good compilers are at taking this sort of thing
into account already.