Re: [PATCH] Avoid loading commits twice in log with diffs

Thomas Rast Mon, 04 Mar 2013 01:59:06 -0800

Junio C Hamano <[email protected]> writes:

> Thomas Rast <[email protected]> writes:
>
>> Test                      with patch        before
>> --------------------------------------------------------------------
>> 4000.2: log --raw -3000   0.50(0.43+0.06)   0.54(0.46+0.06) +7.0%***
>> 4000.3: log -p -3000      2.34(2.20+0.13)   2.37(2.22+0.13) +1.2%
>> --------------------------------------------------------------------
>> Significance hints:  '.' 0.1  '*' 0.05  '**' 0.01  '***' 0.001
>
> It may be a silly question but what is a significance hint?


That's from my still-not-rerolled perf improvements series from, ahem,
ages ago:

  http://thread.gmane.org/gmane.comp.version-control.git/192884

I stole the idea from R (and in fact use R to compute it).  The ***
tells you that the probability of the 7% difference is attributable to
chance only with 0.1% probability, or in other words, it's highly likely
that the difference is *not* down to chance.  On the other hand, note
that the p4000.3 measurements do not show a significant difference
(significance hint is absent).

The statistical background: Assume that the two series of measurements
are drawn from two normal distributions (possibly with different
mean/variance).  Welch's t-test estimates the probability of the null
hypothesis that the two distributions in fact had the same mean.  If you
can reject the null hypothesis, you have essentially proven that there's
*some* difference in the timing runs.  (Hopefully for the better, and
hopefully _not_ caused just by CPU scaling or such.)

By the way, in the above test Jakub pointed me at the Dumbbench perl
module.  I've had a look at the ideas within, and I'll try to put some
sample rejection along their lines into perf-lib.  However, several
things make the module itself rather deserve the name.  Most
prominently, I can only get it to print timings to stdout in a fixed
format designed for human consumption.

>> --- a/log-tree.c
>> +++ b/log-tree.c
>> @@ -715,7 +715,7 @@ static int log_tree_diff(struct rev_info *opt, struct 
>> commit *commit, struct log
>>  {
>>      int showed_log;
>>      struct commit_list *parents;
>> -    unsigned const char *sha1 = commit->object.sha1;
>> +    unsigned const char *sha1 = commit->tree->object.sha1;
>
> Overall I think this goes in the right direction and I can see why
> the changes in later two hunks are correct, but I am not sure if we
> can safely assume that the caller has parsed the incoming commit and
> we have a good commit->tree here already.
>
> Right now, this static function has a single call-site in a public
> function log_tree_commit(), whose existing callers may all pass an
> already parsed commit, but I feel somewhat uneasy to do the above
> without some mechanism in place (either parse it here or in the
> caller if unparsed, or document that log_tree_commit() must be
> called with a parsed commit and perhaps add an assert there) to
> ensure that the invariant is not broken in the future.

In that case I'll reroll with the parsing -- it will have about the same
cost as the assertion, since it'll just check ->object.parsed and
return.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Avoid loading commits twice in log with diffs

Reply via email to