Well, I created a wrapper around a std.array.uninitializedArray
call, to manage the interface I need (queue behavior: pushing at
the end, popping at the beginning). When hitting the end of the
current array, it either reuse the current buffer or create a new
one, depending of the remaining capacity.
On the 'synthetic' benchmarks, it performs quite reasonably: half
the time of Array or Appender (twice faster), 5x faster than
standard array, and 3-4x slower than uninitializedArray.
And... It does not change the timings in my code, it even makes
things slower when pre-allocating to much. Only by pre-allocating
only a few elements do I get back the original timings.
So, I guess I'm suffering from a bad case of premature
optimization :)
I thought that, having lots of concatenation in my code, that'd
be a bottleneck. But it appears than pre-allocation does not give
me any speed-up.
Well, at least it got me thinking, testing LDC a bit more and
learning things on Array and Appender ;)
Thank for your help guys, it's now time for the -profile switch
again...