On 8/23/2018 2:53 PM, Jeff King wrote:
On Thu, Aug 23, 2018 at 06:26:58AM -0400, Derrick Stolee wrote:

I think you can safely
ignore the rest of it if you are otherwise occupied. Even if v2.19 ships
without some mitigation, I don't know that it's all that big a deal,
given the numbers I generated (which for some reason are less dramatic
than Stolee's).
My numbers may be more dramatic because my Linux environment is a virtual
machine.
If you have a chance, can you run p0001 on my patch (compared to
2.19-rc0, or to both v2.18 and v2.19-rc0)? It would be nice to double
check that it really is fixing the problem you saw.

Sure. Note: I had to create a new Linux VM on a different machine between Tuesday and today, so the absolute numbers are different.

Using git/git:

Test      v2.18.0           v2.19.0-rc0             HEAD
-------------------------------------------------------------------------
0001.2:   3.10(3.02+0.08)   3.27(3.17+0.09) +5.5% 3.14(3.02+0.11) +1.3%


Using torvalds/linux:

Test     v2.18.0             v2.19.0-rc0               HEAD
------------------------------------------------------------------------------
0001.2:  56.08(45.91+1.50)   56.60(46.62+1.50) +0.9% 54.61(45.47+1.46) -2.6%


Now here is where I get on my soapbox (and create a TODO for myself later). I ran the above with GIT_PERF_REPEAT_COUNT=10, which intuitively suggests that the results should be _more_ accurate than the default of 3. However, I then remember that we only report the *minimum* time from all the runs, which is likely to select an outlier from the distribution. To test this, I ran a few tests manually and found the variation between runs to be larger than 3%.

When I choose my own metrics for performance tests, I like to run at least 10 runs, remove the largest AND smallest runs from the samples, and then take the average. I did this manually for 'git rev-list --all --objects' on git/git and got the following results:

v2.18.0    v2.19.0-rc0   HEAD
--------------------------------
3.126 s    3.308 s       3.170 s

For full disclosure, here is a full table including all samples:

|      | v2.18.0 | v2.19.0-rc0 | HEAD    |
|------|---------|-------------|---------|
|      | 4.58    | 3.302       | 3.239   |
|      | 3.13    | 3.337       | 3.133   |
|      | 3.213   | 3.291       | 3.159   |
|      | 3.219   | 3.318       | 3.131   |
|      | 3.077   | 3.302       | 3.163   |
|      | 3.074   | 3.328       | 3.119   |
|      | 3.022   | 3.277       | 3.125   |
|      | 3.083   | 3.259       | 3.203   |
|      | 3.057   | 3.311       | 3.223   |
|      | 3.155   | 3.413       | 3.225   |
| Max  | 4.58    | 3.413       | 3.239   |
| Min  | 3.022   | 3.259       | 3.119   |
| Avg* | 3.126   | 3.30825     | 3.17025 |

(Note that the largest one was the first run, on v2.18.0, which is due to a cold disk.)

I just kicked off a script that will run this test on the Linux repo while I drive home. I'll be able to report a similar table of data easily.

My TODO is to consider aggregating the data this way (or with a median) instead of reporting the minimum.

Thanks,

-Stolee

Reply via email to