On Friday, 12 September 2014 at 03:23:55 UTC, Vlad Levenfeld
wrote:
I've got a library I've been building up over a few projects,
and I've only ever run it under "debug" "unittest" and
"release" (with dub "buildOptions").
Lately I've needed to control the performance more carefully,
but unfortunately trying to compile with dub --profile gives me
some strange errors:
1) A few lines in one of my modules are reported as
"unreachable" by dmd. The data they operate on are defined
entirely in code (i.e. not read as external input) so maybe
they're getting CTFE'd into oblivion?
All I know is they're apparently reachable in non-profiled code
(and very essential to the business logic... but they're just
math functions, nothing crazy, one of the unreachable lines
computes the areas of some polygons, another sums the areas up).
2) The linker complains about undefined references to
std.exception.enforce being called from std.stdio.rawRead.
3) If I try to compile with "buildOptions":["profile"] instead
of dub --profile, then it compiles and links but then I
segfault on launch at gc_malloc.
I also recall (but can't seem to find) something about
profiling not working with multithreaded code? Because almost
every encapsulated service in this library runs on its own
thread.
And the code base (>15k LOC) isn't easily reduced, as any
remotely interesting main method I write pretty much pulls from
the entire library. I don't want to have to turn this whole
thing inside out. Its like 95% templates and inlining wreaks
havoc on the logic as well, but that's another problem for
another day...
Does anyone else have these kinds of issues? Are there any
alternative methods of coarse-grained profiling (i.e., not
manually peppering timer calls into my code)? Whats with the
unreachable statements? Any hints on what I can try next to get
closer to a performance profile of my code?
Instrumenting 'conventional' profilers such as DMD's builtin
profiler or gprof are pretty useless for getting reliable data as
they distort the results. I recommend using a sampling profiler.
With sampling profilers you usually get profiling results down to
source line or even instruction level and you don't need to
recompile your binary (having debug symbols is needed for source
lines, though). They also tend to be able to measure more than
just time (e.g. cache misses for individual caches, branches
_and_ branch mispredictions, FPU usage, etc, etc)
If you're on Linux, 'perf' is good (on Ubuntu/Mint, possibly
other distros just type 'perf' into the console and it will tell
you what package to install, usually it's 'linux-tools-common').
https://perf.wiki.kernel.org/index.php/Tutorial
It also has the awesome 'perf top' utility that allows you to
profile in real-time, like 'top' but with functions instead of
processes.
OProfile is good *if you can get it to run*, very similar in
usage to perf but I almost always run into some issue.
AMD CodeXL is also decent and on both Linux and Windows, although
on non-AMD CPUs it can only measure execution time (still very
useful, down to instruction level).
RotateRight Zoom, Intel VTune should also be good, but both are
commercial.
If you're writing a game or any other real-time interactive
application and need to profile occasional lags, you might need a
different approach
(but in this case you won't avoid manual instrumentation,
although it's rather easy to use):
http://defenestrate.eu/2014/09/05/frame_based_game_profiling.html
https://github.com/kiith-sa/tharsis.prof