Hello

On Sun, Jul 11, 2010 at 12:10 AM, Kai <[email protected]> wrote:

>
> Ages ago when you first submitted the Myers diff implementation, you
> said that it was slower than the difflib implementation; do you have
> any idea how much slower? I'd be interested in finding out whether
> switching to Myers wholesale would be a reasonable thing to do.
>


> Also, there is an implementation of Myers diff in Review Board (in
> reviewboard/diffviewer/myersdiff.py). It may be worthwhile looking at
> that code (and other related bits) to see whether there is anything we
> can share/appropriate for use in Meld.
>

Ages ago it was indeed and things have changed :) I am attaching a small
patch that boosts the performance of Myers matcher significantly. Once
applied, MyersSequenceMatcher will become comparable with other existing
sequence matchers.
I performed a couple of tests to check the impact of this path and the
results (time taken to complete the test) are attached below.
I also included Myers diff from Review Board, but it looks like their
implementation cannot offer anything when it comes to performance (the
result of rewriting GNU diff in Python with no Python-specific
optimizations). There is one post-processing trick (shift inserts/deletions
of identical lines) that may be worth adding to meld though. However, more
testing is definitely needed.
I will be off for 2 weeks, but when I am back I will try to test a couple of
scenarios related to inline changes highlighting as they are slightly
different due to very low number of unique lines (each line = 1 character).

Regards,
Piotr

Test: calculate diff 100 times in a loop (I used historical revisions of
filediffer.py and meldapp.py)

1. Almost identical files (1452 lines in common, 4 different lines):
IncrementalSequenceMatcher: 0:00:00.542536
PatienceSequenceMatcher: 0:00:01.019957
MyersSequenceMatcher: 0:00:00.034457
MyersDiffer (Review Board): 0:00:01.436764

2 .Common: 859 diferent: 82 (8.5%)
IncrementalSequenceMatcher: 0:00:00.310261
PatienceSequenceMatcher: 0:00:00.597474
MyersSequenceMatcher: 0:00:00.280938
MyersDiffer (Review Board): 0:00:01.021952

3. Common: 1259 diferent: 252 (17%)
IncrementalSequenceMatcher: 0:00:01.087525
PatienceSequenceMatcher: 0:00:00.920743
MyersSequenceMatcher: 0:00:00.536368
MyersDiffer (Review Board): 0:00:02.267483

4. Common: 472 diferent: 678 (59%)
IncrementalSequenceMatcher: 0:00:00.780251
PatienceSequenceMatcher: 0:00:00.490673
MyersSequenceMatcher: 0:00:00.368780
MyersDiffer (Review Board): 0:00:01.844979

5. Completely different files (common: 125, different: 1668):
IncrementalSequenceMatcher: 0:00:00.737094
PatienceSequenceMatcher: 0:00:00.405088
MyersSequenceMatcher: 0:00:00.636358
MyersDiffer (Review Board): 0:00:02.308354
From 0cb38b90d4c984bc6a4e21a1145375e11b94fef2 Mon Sep 17 00:00:00 2001
From: Piotr Piastucki <[email protected]>
Date: Mon, 12 Jul 2010 00:05:37 +0200
Subject: [PATCH] Myers matcher performance improvement

---
 meld/matchers.py |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/meld/matchers.py b/meld/matchers.py
index fb81c24..f9ed34d 100644
--- a/meld/matchers.py
+++ b/meld/matchers.py
@@ -101,17 +101,23 @@ class MyersSequenceMatcher(difflib.SequenceMatcher):
 
         # discard lines that do not match any line from the other file
         if n > 0 and m > 0:
+            aset = set()
+            bset = set()
+            for newline in b:
+                bset.add(newline)
+            for newline in a:
+                aset.add(newline)
             a2 = []
             b2 = []
             j = 0
             for i, newline in enumerate(b):
-                if newline in a:
+                if newline in aset:
                     b2.append(newline)
                     bindex[j] = i
                     j += 1
             k = 0
             for i, origline in enumerate(a):
-                if origline in b:
+                if origline in bset:
                     a2.append(a[i])
                     aindex[k] = i
                     k += 1
-- 
1.7.0.4

_______________________________________________
meld-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/meld-list

Reply via email to