On 9/24/13 11:02 AM, Peter Alexander wrote:
On Tuesday, 24 September 2013 at 17:02:18 UTC, Andrei Alexandrescu wrote:
On 9/24/13 9:58 AM, Peter Alexander wrote:
On Tuesday, 24 September 2013 at 15:25:11 UTC, Andrei Alexandrescu
wrote:
What are they paying exactly? An extra arg to allocate that can
probably
be defaulted?
void[] allocate(size_t bytes, size_t align = this.alignment) shared;
For allocating relatively small objects (say up to 32K), we're looking
at tens of cycles, no more. An extra argument needs to be passed
around and more importantly looked at and acted upon. At this level
it's a serious dent in the time budget.
The cost of a few cycles really doesn't matter for memory allocation...
If you are really allocating memory so frequently that those few extra
cycles matter then you are probably going to be memory bound anyway.
It does. I'm not even going to argue this.
Sorry but I find this insulting.
Apologies, you're right. I was bummed straight after having sent that
all-too-glib message.
Myself and Manu, both professional and
senior game developers with a lot of experience in performance are both
arguing against you. I'm not saying this makes us automatically right,
but I think it's rude to dismiss our concerns as not even worthy of
discussion.
This is not an argument "against me" - I'm looking at ways to address
alignment concerns.
There's a larger issue at work: certain special allocator APIs are
sensible but are unneeded for composition. The focus here is to provide
enough primitives that allow composing larger allocators out of smaller
components. A top-level specialized allocator may implement some or all
of the discussed API, plus a bunch of other specialized functions.
I think this is a situation where you need to justify yourself with
something concrete. Can you provide an example of some code whose
performance is significantly impacted by the addition of an alignment
parameter? It has to be "real code" that does something useful, not just
a loop the continually calls allocate.
Strings.
Strings what? Just allocating lots of small strings?
Ok, I've put together a benchmark of the simplest allocator I can think
of (pointer bump) doing *nothing* but allocating 12 bytes at a time and
copying a pre-defined string into the allocated memory:
http://dpaste.dzfl.pl/59636d82
On my machine, the difference between the version with alignment and the
version without 1%. I tried changing the allocator to a class so that
the allocation was virtual and not inlined, and the difference was still
only ~2% (Yes, I verified in the generated code that nothing was being
omitted).
In a real scenario, much more will be going on outside the allocator,
making the overhead much less than 1%.
Please let me know if you take issue with the benchmark. I wrote this
quickly so hopefully I have not made any mistakes.
There is really no need for a benchmark, for at least two reasons.
First, people _will_ do cycle counting, which _will_ influence the
bottom line. I work with a dozen of such. I understand it doesn't make a
difference for you, Manu, a lot of game developers, and a bunch of
others, but I know for a fact it _does_ make a difference for an
important category of program(mer)?s.
Second, there's no need for a defaulted argument; the aligned allocation
can be an optional overload of the one-argument function. I'm looking
into ways to compose with that overload.
Andrei