Thanks for you suggestion Tim. I will try some benchmarks in the weekend and report back.
It is possible that micro benchmarks might be misleading since in real code you might trash the instruction cache if you load a fat unrolled monster function into it.
