On Tuesday, 26 June 2018 at 02:10:17 UTC, Manu wrote:
Some code:
---------------------------------
struct Entity
{
enum NumSystems = 4;
struct SystemData
{
uint start, length;
}
SystemData[NumSystems] systemData;
@property uint systemBits() const { return
systemData[].map!(e =>
e.length).sum; }
}
Entity e;
e.systemBits(); // <- call the function, notice the codegen
---------------------------------
This property sum's 4 ints... that should be insanely fast. It
should
also be something like 5-8 lines of asm.
Turns out, that call to sum() is eating 2.5% of my total perf
(significant among a substantial workload), and the call tree
is quite
deep.
Basically, inliner tried, but failed to seal the deal, and
leaves a call stack 7 levels deep.
Pipeline programming is hip and also *recommended* D usage. The
optimiser must do a good job. This is such a trivial workloop,
and
with constant length (4).
I expect 3 integer adds to unroll and inline. A call-tree 7
levels
deep is quite a ways from the mark.
Maybe this is another instance of Walter's "phobos begat
madness" observation?
The unoptimised callstack is mental. Compiling with -O trims
most of
the noise in the call tree, but it fails to inline the
remaining work
which ends up 7-levels down a redundant call-tree.
Then use LDC! ;)
But seriously, DMD's inliner is a) in the wrong spot in the
compilation pipeline (at the AST level) and b) is timid to say
the least.