The current logic for concatenating small buckets in core_output_filter() will have performance problems in two cases that I can think of:
* If you have a brigade consisting of several dozen small buckets, each one can get copied several times (because we only concatenate MAX_IOVEC_TO_WRITE of them at at time). * If the brigade consists of, say, MAX_IOVEC_TO_WRITE+1 buckets of size 1MB each, the code will do a huge memory copy to (needlessly) concatenate the first MAX_IOVEC_TO_WRITE of them. My proposed solution is to change the logic as follows: * Skip the concatenation if there's >= 8KB of data already referenced in the iovec. * Rather than creating a temporary brigade for concatenation, create a heap bucket. Make it big enough to hold 8KB. Pop the small buckets from the brigade, concatenate their contents into the heap bucket, and push the heap bucket onto the brigade. * If we end up in the concatenation again during the foreach loop through the brigade, add the small buckets to the end of the previously allocated heap bucket. If the heap bucket size is about to exceed 8KB, stop. Comments? Thanks, --Brian