On 7-5-2010 12:01, Lionello Lunesu wrote:
> On 7-5-2010 9:10, Brad Roberts wrote:
>> On Fri, 7 May 2010, Lionello Lunesu wrote:
>>
>>> On 6-5-2010 22:37, Michel Fortin wrote:
>>>> On 2010-05-05 23:45:50 -0400, Walter Bright <newshou...@digitalmars.com>
>>>> said:
>>>>
>>>>> Walter Bright wrote:
>>>>>> Alex Makhotin wrote:
>>>>>>> It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3
>>>>>>> kernel 2.6.32.4), to get the actual error messages about the
>>>>>>> undefined identifier.
>>>>>>
>>>>>> Definitely there's a problem.
>>>>>
>>>>> The problem is the spell checker is O(n*n) on the number of characters
>>>>> in the undefined identifier.
>>>>
>>>> That's an algorithm that can't scale then.
>>>>
>>>> Checking the Levenshtein distance for each known identifier within a
>>>> small difference in length would be a better idea. (Clang is said to use
>>>> the Levenshtein distance, it probably does something of the sort.)
>>>>
>>>> http://en.wikipedia.org/wiki/Levenshtein_distance
>>>>
>>> and especially this line:
>>>
>>> # If we are only interested in the distance if it is smaller than a
>>> threshold k, then it suffices to compute a diagonal stripe of width 2k+1
>>> in the matrix. In this way, the algorithm can be run in O(kl) time,
>>> where l is the length of the shortest string.
>>
>> The source for this is pretty isolated.. anyone want to volunteer take a 
>> shot at improving this part of dmd?
>>
>> Later,
>> Brad
> I see, speller.c, I'll have a look..

I'm in the middle of moving from one city to another so don't wait for
me. I have attached the D version of the code in the wikipedia article
(including the patch for transpositions.)

It's not straightforward to drop this in (apart from it being in D),
since speller.c creates all variations on a string (=inefficient) and
uses a callback function to check if a variation is a valid symbol.

I'll have more time to look at this next week.

L.
T minimum(T)(T[] all ...)
in {
        assert(all.length > 0);
}
out(result) {
        foreach (t; all)
                assert(result <= t);
}
body {
        T ret = all[0];
        foreach (t; all)
                if (t < ret)
                        ret = t;
        return ret;
}

struct array2d(T) {
        private int w;
        private T[] data;
        public this(int w, int h) {
                this.data = new T[w*h];
                this.w = w;
        }
        public T get(int x, int y) {
                return data[x+y*w];
        }
        public void set(int x, int y, T t) {
                data[x+y*w] = t;
        }
}

int LevenshteinDistance(string s, string t)
{
        const int m = s.length;
        const int n = t.length;
        // d is a table with m+1 rows and n+1 columns
        array2d!(int) d = array2d!(int)(m+1,n+1);

        foreach (i; 0..m+1)
                d.set(i, 0, i); // deletion
        foreach (j; 0..n+1)
                d.set(0, j, j); // insertion

        int k = 0;
        foreach (j; 1..n+1)
        {
                foreach (i; 1..m+1)
                {
                        int cost = 1;
                        if (s[i-1] == t[j-1]) 
                                cost = 0;
                        k = minimum(
                                d.get(i-1, j) + 1,      // deletion
                                d.get(i, j-1) + 1,      // insertion
                                d.get(i-1, j-1) + cost  // substitution
                                );
                        // transposition  ab -> ba
                        if (i > 1 && j > 1 && s[i-1] == t[j-2] && s[i-2] == 
t[j-1])
                                k = minimum(
                        k,
                        d.get(i-2, j-2) + cost  // transposition
                        );
                        d.set(i, j, k);
                }
        }
        return k;//d.get(m,n);
}

unittest
{
        assert(LevenshteinDistance("hello", "hello") == 0);//nop
        assert(LevenshteinDistance("hello", "helo") == 1);//del
        assert(LevenshteinDistance("hello", "heello") == 1);//ins
        assert(LevenshteinDistance("hello", "hallo") == 1);//change
        assert(LevenshteinDistance("hello", "hlelo") == 1);//transp
        assert(LevenshteinDistance("hello", "hlelq") == 2);
}

import std.stdio;
void main(string[] args) {
  writefln("%s", LevenshteinDistance(args[1], args[2]));
}

Reply via email to