Re: Patch to couch_btree:chunkify

Paul Davis Mon, 11 May 2009 13:08:16 -0700

On Mon, May 11, 2009 at 3:57 PM, Adam Kocoloski
<[email protected]> wrote:
> I'd like to see some more concrete numbers on the performance difference the
> two versions.  I wasn't able to reproduce Chris' 10%+ speedup using
> hovercraft:lightning; in fact, the two versions seem to be compatible within
> measurement variance.
>


Hmm. My first question is if everyone is using the same parameters for
inserting documents. I'm pretty sure that this patch isn't going to
make too much of a difference unless the size of a _bulk_save is big
enough to make the binary_to_term's noticeable. Other than that, the
only thing I could point at for differences would be Erlang VM version
or hardware.

> I tried messing around with fprof for a while today, and if anything it
> indicates that the original version might actually be faster (though I find
> that hard to believe).  Anyway, I think we should get in the habit of having
> some quantitative, reproducible way of evaluating performance-related
> patches.
>

We definitely need some standard benchmarks to make sure we're not
getting performance regressions else where.

> +1 for Bob's suggestion of stripping out Bt from the arguments, though.
>

Sounds good.

> Adam
>
> On May 11, 2009, at 3:28 PM, Damien Katz wrote:
>
>> +1 for committing.
>>
>> -Damien
>>
>>
>> On May 10, 2009, at 9:49 PM, Paul Davis wrote:
>>
>>> Chris reminded me that I had an optimization patch laying around for
>>> couch_btree:chunkify and his tests show that it gets a bit of a speed
>>> increase when running some tests with hovercraft. The basic outline of
>>> what I did was to swap a call like term_to_binary([ListOfTuples]) to a
>>> sequence of ListOfSizes = lists:map(term_to_binary, ListOfTuples),
>>> Size = sum(ListOfSizes), and then when we go through the list of
>>> tuples to split them into chunks I use the pre calculated sizes.
>>>
>>> Anyway, I just wanted to run it across the list before I commit it in
>>> case anyone sees anything subtle I might be missing.
>>>
>>> chunkify(_Bt, []) ->
>>>  [];
>>> chunkify(Bt, InList) ->
>>>  ToSize = fun(X) -> size(term_to_binary(X)) end,
>>>  SizeList = lists:map(ToSize, InList),
>>>  TotalSize = lists:sum(SizeList),
>>>  case TotalSize of
>>>  Size when Size > ?CHUNK_THRESHOLD ->
>>>      NumberOfChunksLikely = ((Size div ?CHUNK_THRESHOLD) + 1),
>>>      ChunkThreshold = Size div NumberOfChunksLikely,
>>>      chunkify(Bt, InList, SizeList, ChunkThreshold, [], 0, []);
>>>  _Else ->
>>>      [InList]
>>>  end.
>>>
>>> chunkify(_Bt, [], [], _Threshold, [], 0, Chunks) ->
>>>  lists:reverse(Chunks);
>>> chunkify(_Bt, [], [], _Threshold, OutAcc, _OutAccSize, Chunks) ->
>>>  lists:reverse([lists:reverse(OutAcc) | Chunks]);
>>> chunkify(Bt, [InElement | RestInList], [InSize | RestSizes], Threshold,
>>> OutAcc,
>>>      OutAccSize, Chunks) ->
>>>  case InSize of
>>>  InSize when (InSize + OutAccSize) > Threshold andalso OutAcc /= [] ->
>>>      chunkify(Bt, RestInList, RestSizes, Threshold, [], 0,
>>>          [lists:reverse([InElement | OutAcc]) | Chunks]);
>>>  InSize ->
>>>      chunkify(Bt, RestInList, RestSizes, Threshold, [InElement | OutAcc],
>>>          OutAccSize + InSize, Chunks)
>>>  end.
>>
>
>

Re: Patch to couch_btree:chunkify

Reply via email to