--- Comment #17 from Rob Jacques <> 2012-01-02 13:16:19 PST ---
> > That said, I did write a faster
> > chained appender for your benchmarks; however I did this by initializing the
> > appender with a page of memory, which definitely should not be the default
> > behavior. That said, one viable strategy for appender would be to keep 1 
> > page
> > of memory around as a scratch pad. 
> Can you elaborate on this (or share your code)?

For part 1 (fastest possible 'chained' appender): Simply construct specifying a
large number of elements. i.e.
auto app = Appender!string(4096);
FastestAppender7 seems to do something similar (enum MIN_SIZE  = 4096;)

As for part 2 (fastest practical 'chained' appender) I haven't written it yet.
In essence you'd have two TLS variables, a scratch node and memory page and an
in use flag.

private static void* __Appender_scratch_node;
private static bool  __Appender_scratch_in_use;
Appender(T) {...}

Then when appender is constructed instead of creating a new node and a little
bit of ram each time, a single node and 1 page of memory would be reused. The
boolean flag would prevent a second appender from reusing the same scratch pad;
(I figure 2+ appenders would be relatively rare). Then, when data is called a
single copy would be made of the correct length (Appending after data should
also be relatively rare). 

> > > 5) It's better to store an "end" pointer than a "capacity" integer
> > I'll look into this, but this you can not make this judgment call based on 
> > the
> > performance of a char appender. The issue is that anything with a power of 2
> > T.sizeof performs integer multiplication/division using shift operations
> > instead of the actual underlying instructions, both of which are very slow.
> I'm not following your explanation. I don't see how element size plays against
> my conclusion - in fact, as far as I can see, calculating a
> "position-after-append" pointer and comparing it to an "end" pointer is going
> to be faster, because you will need to update the position anyway.

I'm referring to the fact that ptr + i => cast(T*)(cast(size_t)ptr +
i*T.sizeof) and ptrA - ptrB => (cast(size_t)ptrA - cast(size_t)ptrB) /
T.sizeof. Which can lead to a subtle performance issues when T.sizeof != 1 or a
power of 2. But, my first code review doesn't reveal any worrying usages in the
primary code path; most cases of ptrA-ptrB seem to be behind rarely used if

Regarding your previous assertion:
> 4) You can write a nextCapacity function with no branches
I'm having trouble finding this in the code. All I can find is:

auto newCapacity = newSize < MIN_SIZE ? MIN_SIZE : newSize * 2;

which contains a branch. (i.e. the ?: operator).

Also, I know understand better what you meant by 
> 1) Extra indirection levels are performance killers
Unfortunately, your attempt to reduce the number of indirections has broken one
of the invariants of Appender; specifically that it is a reference type.
Putting cursors, etc. into the Appender struct will result in data stomping.

Configure issuemail:
------- You are receiving this mail because: -------

Reply via email to