On Fri, Jun 19, 2015 at 03:34:55AM -0400, Jeff King wrote:

> And here's some more bad news. If you look at the diff for this
> patch itself, it's terribly unreadable (the regular diff already is
> pretty bad, but the highlights make it much worse). There are big chunks
> where we take away 5 or 10 lines from the old code, and replace them
> with totally unrelated lines. We end up highlighting almost the entire
> thing, except for spaces and punctuation.
> 
> We might be able to solve this with a percentage heuristic similar to
> the one Patrick proposed. It's not really interesting to highlight
> unless we're doing it on probably 20% or less of the diff (where 20% is
> a number I just made up).

That turned out to be pretty easy; patch is below (on top of what I sent
earlier). I set the percentage at 50% based on eyeballing "git log -p"
in git.git, and it seems to give good results.

So I think the big remaining issue is improved tokenizing. Maybe Patrick
will want to take a stab at it.

---
diff --git a/contrib/diff-highlight/diff-highlight 
b/contrib/diff-highlight/diff-highlight
index 1525ccc..9454446 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -114,12 +114,32 @@ sub show_hunk {
                        if $bits & 2;
        }
 
+       my $highlighted = count_highlight(@highlight_a) +
+                         count_highlight(@highlight_b);
+       my $total = length($a) + length($b);
+       my $pct = $highlighted / $total;
+
+       if ($pct > 0.5) {
+               @highlight_a = ();
+               @highlight_b = ();
+       }
+
        # And now show the output both with the original stripped annotations,
        # as well as our new highlights.
        show_image($a, [merge_annotations(\@stripped_a, \@highlight_a)]);
        show_image($b, [merge_annotations(\@stripped_b, \@highlight_b)]);
 }
 
+sub count_highlight {
+       my $total = 0;
+       while (@_) {
+               my $from = shift;
+               my $to = shift;
+               $total += $to->[0] - $from->[0];
+       }
+       return $total;
+}
+
 # Strip out any diff syntax (i.e., leading +/-), along with any ANSI color
 # codes from the pre- or post-image of a hunk. The result is a string of text
 # suitable for diffing against the other side of the hunk.
--
To unsubscribe from this list: send the line "unsubscribe git" in

Reply via email to