I find that I have little clue about how to convert the following brief
test script into some test to place in t/perf:
rm -rf /tmp/git-test
yes a|head -$LIMIT >data
yes b|head -$LIMIT >data2
git add data data2
git commit -m "split"
git rm data2
b' | head -$(($LIMIT*2)) >data
git add data
git commit -m "combined"
time git blame data >/dev/null
The variable LIMIT is the deciding factor for determining performance
which, with the code in current master, is rather measurably O(LIMIT^2).
I think that the current test takes about 15 minutes to complete on my
Obviously, that's sort of excessive: there is little point in choosing
sizes that show off more than two orders of magnitude in improvement.
Now the pathological cases are lots of small but attributable fragments
in the blamed source files. One real-world project that is hit rather
hard is an alphabetically sorted large list of words that tends to get
insertions/deletions of few scattered lines at a time.
Should one aim for an actually pathological case like in this script?
Should one try benchmarking with one of the stock repositories instead
that don't really demonstrate well just how bad the behavior might
become and which code passages are dominant regarding the quadratic