[PATCH] Documentation/diff-options: explain different diff algorithms

Stefan Beller Mon, 06 Aug 2018 15:26:48 -0700

As a user I wondered what the diff algorithms are about. Offer at least
a basic explanation on the differences of the diff algorithms.


Signed-off-by: Stefan Beller <[email protected]>
---
 Documentation/diff-options.txt | 10 +++++++---
 Documentation/git-diff.txt     | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index f394608b42c..eae033a21ea 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -91,14 +91,18 @@ appearing as a deletion or addition in the output. It uses 
the "patience
 diff" algorithm internally.
 
 --diff-algorithm={patience|minimal|histogram|myers}::
-       Choose a diff algorithm. The variants are as follows:
+       Choose a diff algorithm. See the discussion of DIFF ALGORITHMS
+ifndef::git-diff[]
+       in linkgit:git-diff[1]
+endif::git-diff[]
+       . The variants are as follows:
 +
 --
 `default`, `myers`;;
        The basic greedy diff algorithm. Currently, this is the default.
 `minimal`;;
-       Spend extra time to make sure the smallest possible diff is
-       produced.
+       The same algorithm as `myers`, but spend extra time to make
+       sure the smallest possible diff is produced.
 `patience`;;
        Use "patience diff" algorithm when generating patches.
 `histogram`;;
diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
index b180f1fa5bf..b182389aaae 100644
--- a/Documentation/git-diff.txt
+++ b/Documentation/git-diff.txt
@@ -119,6 +119,40 @@ include::diff-options.txt[]
 
 include::diff-format.txt[]
 
+DIFF ALGORITHMS
+---------------
+`Myers`
+
+A diff as produced by the basic greedy algorithm described in
+link:http://www.xmailserver.org/diff2.pdf[An O(ND) Difference Algorithm and 
its Variations].
+with a run time of O(M + N + D^2). It employs a heuristic to allow for
+a faster diff at the small cost of diff size.
+The `minimal` algorithm has that heuristic turned off.
+
+`Patience`
+
+This algorithm by Bram Cohen matches the longest common subsequence
+of unique lines on both sides, recursively. It obtained its name by
+the way the longest subsequence is found, as that is a byproduct of
+the patience sorting algorithm. If there are no unique lines left
+it falls back to `myers`. Empirically this algorithm produces
+a more readable output for code, but it does not garantuee
+the shortest output.
+
+`Histogram`
+
+This algorithm finds the longest common substring and recursively
+diffs the content before and after the longest common substring.
+If there are no common substrings left, fallback to `myers`.
+This is often the fastest, but in corner cases (when there are
+many common substrings of the same length) it produces bad
+results as seen in:
+
+       seq 1 100 >one
+       echo 99 > two
+       seq 1 2 98 >>two
+       git diff --no-index --histogram one two
+
 EXAMPLES
 --------
 
-- 
2.18.0.597.ga71716f1ad-goog

[PATCH] Documentation/diff-options: explain different diff algorithms

Reply via email to