On Fri, Jun 19, 2015 at 03:34:55AM -0400, Jeff King wrote:
> And here's some more bad news. If you look at the diff for this
> patch itself, it's terribly unreadable (the regular diff already is
> pretty bad, but the highlights make it much worse). There are big chunks
> where we take away 5 or 10 lines from the old code, and replace them
> with totally unrelated lines. We end up highlighting almost the entire
> thing, except for spaces and punctuation.
>
> We might be able to solve this with a percentage heuristic similar to
> the one Patrick proposed. It's not really interesting to highlight
> unless we're doing it on probably 20% or less of the diff (where 20% is
> a number I just made up).
That turned out to be pretty easy; patch is below (on top of what I sent
earlier). I set the percentage at 50% based on eyeballing "git log -p"
in git.git, and it seems to give good results.
So I think the big remaining issue is improved tokenizing. Maybe Patrick
will want to take a stab at it.
---
diff --git a/contrib/diff-highlight/diff-highlight
b/contrib/diff-highlight/diff-highlight
index 1525ccc..9454446 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -114,12 +114,32 @@ sub show_hunk {
if $bits & 2;
}
+ my $highlighted = count_highlight(@highlight_a) +
+ count_highlight(@highlight_b);
+ my $total = length($a) + length($b);
+ my $pct = $highlighted / $total;
+
+ if ($pct > 0.5) {
+ @highlight_a = ();
+ @highlight_b = ();
+ }
+
# And now show the output both with the original stripped annotations,
# as well as our new highlights.
show_image($a, [merge_annotations(\@stripped_a, \@highlight_a)]);
show_image($b, [merge_annotations(\@stripped_b, \@highlight_b)]);
}
+sub count_highlight {
+ my $total = 0;
+ while (@_) {
+ my $from = shift;
+ my $to = shift;
+ $total += $to->[0] - $from->[0];
+ }
+ return $total;
+}
+
# Strip out any diff syntax (i.e., leading +/-), along with any ANSI color
# codes from the pre- or post-image of a hunk. The result is a string of text
# suitable for diffing against the other side of the hunk.
--
To unsubscribe from this list: send the line "unsubscribe git" in