By self-optimizing, I mean:
1. You have N locations in your code, where two different implementations are
possible (it could be more than two, but I'm trying to keep this simple).
2. The global performance of your code/module depends on the _combination_ of
those N locations (and could be platform specific too).
3. N much bigger than 1, says 10, so that would be 1024 combinations to try;
too much to do it manually....
So what you could do is:
1. Create a benchmark that evaluates the performance of your code.
2. Create a batch file that will re-compile your code and run the benchmark
in a loop, until some stop condition is detected ("stop file" present?)
3. Create a file that will contain the set of flag values, together with
benchmark results (initially empty).
4. Read the file using staticRead() and put the result in "const"s, such that
it can be used in "when" (I _assume_ this is possible)
5. When the file is empty, assume all flags are false.
6. When the file is not empty, read the last line and "binary increment" the
bit-vector (presumably represented as an uint16, since you have 10 flags)
7. At the end of the benchmark, add a line with the current flags and result.
8. When the last combination was tried, create the stop condition ("stop
file")
9. Start the batch file, and go take a long break (or let it run on someone
else's computer)
10. Check the result file to find out the best combination of flags.
11. Either hard-code this combination, or put it in a separate file per
compilation target...
12. Done.
I suspect this general idea might have come up here before. Maybe someone has
already coded something like this?