Cliff Woolley wrote:

On Fri, 28 Jun 2002, Brian Pane wrote:



I remembered why memcpy won't help here: we don't know the
length in advance. But I managed to speed up apr_brigade_puts()
by about 30% in my tests by optimizing its main loop. Does this
patch reduce the apr_brigade_puts() overhead in your test environment?



Why won't the compiler unroll this loop for you?

gcc -O3 -funroll-loops


I tried this, and it didn't unroll the loop. That's probably because some of information needed to unroll the loop effectively is unknown to the compiler. The condition for continuing this loop is: 1) not at the end of the input string, and 2) not at the end of the target bucket. We have a "lookahead" capability on the second condition, but not on the first one. I.e., we know how many more bytes remain in the target bucket, and thus we can unroll the loop into blocks of 'n' character-copy operations with a check for 'n' available bytes of writable buffer space only once per iteration. (We also know that, for small values of 'n', there are almost always more than 'n' bytes left in the bucket, so that we can actually take advantage of this optimization in the real world.) In contrast, the check for end-of-string can't be unrolled very effectively: there's no way to avoid having to put a conditional branch in front of every "*buf++=*str++" operation. Thus the patch unrolls the loop in a way that reduces the number of end-of-bucket checks, even though it's impossible to reduce the number of end-of-string checks.

--Brian




Reply via email to