Re: [PATCH] Re: 2.0 performance Re: Breaking something? Now is the time?

Brian Pane 29 Jun 2002 20:07:05 -0000

Cliff Woolley wrote:

On Fri, 28 Jun 2002, Brian Pane wrote:
I remembered why memcpy won't help here: we don't know the length in advance. But I managed to speed up apr_brigade_puts() by about 30% in my tests by optimizing its main loop. Does this patch reduce the apr_brigade_puts() overhead in your test environment?
Why won't the compiler unroll this loop for you?
gcc -O3 -funroll-loops


I tried this, and it didn't unroll the loop.  That's probably
because some of information needed to unroll the loop effectively
is unknown to the compiler.  The condition for continuing this
loop is: 1) not at the end of the input string, and 2) not at
the end of the target bucket.  We have a "lookahead" capability
on the second condition, but not on the first one.  I.e., we know
how many more bytes remain in the target bucket, and thus we can
unroll the loop into blocks of 'n' character-copy operations with
a check for 'n' available bytes of writable buffer space only
once per iteration.  (We also know that, for small values of 'n',
there are almost always more than 'n' bytes left in the bucket,
so that we can actually take advantage of this optimization in
the real world.)  In contrast, the check for end-of-string
can't be unrolled very effectively: there's no way to avoid
having to put a conditional branch in front of every "*buf++=*str++"
operation.  Thus the patch unrolls the loop in a way that reduces
the number of end-of-bucket checks, even though it's impossible to
reduce the number of end-of-string checks.

--Brian

Re: [PATCH] Re: 2.0 performance Re: Breaking something? Now is the time?

Reply via email to