Is there a good article written for this? Preferably for D
specifically...
I notice as I'm working a bit with my challenge to make/update
the symbol/id compressor that perhaps the GC is getting in the
way and skewing the results. Means a number of what I've put up
as benchmark values may wildly off. So forgive my ignorance.
So first, compiling flags. What should be used? So far -inline
-noboundscheck -O -release
Flags for a C program (if it comes into play?) I only can see -o
-c to be applicable (then link it to your program).
How do I work around/with the GC? what code should I use for
benchmarks?
Currently I'm trying to use the TickDuration via Benchmark, but
it isn't exactly an arbitrary unit of time. If benchmark is a bad
choice, what's a good one?
As for the GC, since it might be running or pause threads in
order to run, how do I ensure it's stopped before I do my
benchmarks?
Here's what I have so far...
import core.thread : thread_joinAll;
import core.memory;
import std.datetime : benchmark;
//test functions
//actual functions slower than lambdas??
auto f1 = (){};
auto f2 = (){};
int rounds = 100_000;
GC.collect();
//GC.reserve(1024*1024*32); //no guarantee of reserves. So
would this help?
thread_joinAll(); //guarentees the GC is done?
GC.disable(); //turned off
auto test1 = benchmark!(f1)(rounds);
GC.collect(); //collect between tests
thread_joinAll(); //make sure GC is done?
auto test2 = benchmark!(f2)(rounds);
//collect, joinall
...
//optional cleanup after the fact? Or leave the program to do
it after exiting?
//GC.enable();
//GC.collect();
Is it better to have a bunch of free memory and ignore leaks? Or
to free memory as it's going through for cases that require it?
//compress returns memory malloc'd, compiled with DMC and C
code.
char *compress(cast(char*) ptr, int size);
auto f3 = (){
compress(cast(char*) haystack.ptr, haystack.length);
//this with leaks?
GC.free(compress(cast(char*) haystack.ptr, haystack.length));
//or this?
};
Is memory allocated by DMC freed properly by GC.free if I end up
using it this way? (For all I know GC.free ignores the pointer).
If I do a separate allocations to match what the functions and
calls did, can I subtract it to get a cleaner set of statistics?
Or is that line of thinking a wrong?
auto f3_mm = (){
void *ptr = GC.malloc(1024);
GC.free(ptr);
};
auto test2 = benchmark!(f3, f3_mm)(rounds); //f3-f3_mm = delta?
For the functions/lambdas passed to benchmark, is it better to
provide all the information in the function and not have data
stored elsewhere? Or store it all as a pure function? Does the
overhead of the extra stack pointer make any difference?
Is it better to collect all the tests and output the results all
at once? Or is it okay or better to output the statistics as they
are finished (between benchmarks and before the
collection/thread_joinall calls)?
What other things should I do/consider when writing basic
benchmark code?