On Mar 2, 2007, at 10:42 AM, Mario Minati wrote:
I would like to find out if there are already companies in my
addressbook (DB) which are similar to a given name to avoid double
entries.
How to measure similarity:
I'am thinking of the hammingdistance. That means the difference
between Linux and Linus is 1 as there is one letter different. The
distance between Linux and Lisa is 3 as there is one letter more
and two are different.
Does anyone have an idea how to realize that?
Can one realize this with code running on the database (PL/SQL or
something) or is there a way doing that with DBIx::Class (drawback:
all data had to read before processing).
In addition to my other response, you might find this interesting,
although I would be a bit afraid of how long this might take in a
large database...
From http://www.merriampark.com/ldplsql.htm
Levenshtein Distance Algorithm: Oracle PL/SQL Implementation
by Barbara Boehmer ([EMAIL PROTECTED])
CREATE OR REPLACE FUNCTION ld -- Levenshtein distance
(p_source_string IN VARCHAR2,
p_target_string IN VARCHAR2)
RETURN NUMBER
DETERMINISTIC
AS
v_length_of_source NUMBER := NVL (LENGTH (p_source_string), 0);
v_length_of_target NUMBER := NVL (LENGTH (p_target_string), 0);
TYPE mytabtype IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;
column_to_left mytabtype;
current_column mytabtype;
v_cost NUMBER := 0;
BEGIN
IF v_length_of_source = 0 THEN
RETURN v_length_of_target;
ELSIF v_length_of_target = 0 THEN
RETURN v_length_of_source;
ELSE
FOR j IN 0 .. v_length_of_target LOOP
column_to_left(j) := j;
END LOOP;
FOR i IN 1.. v_length_of_source LOOP
current_column(0) := i;
FOR j IN 1 .. v_length_of_target LOOP
IF SUBSTR (p_source_string, i, 1) =
SUBSTR (p_target_string, j, 1)
THEN v_cost := 0;
ELSE v_cost := 1;
END IF;
current_column(j) := LEAST (current_column(j-1) + 1,
column_to_left(j) + 1,
column_to_left(j-1) + v_cost);
END LOOP;
FOR j IN 0 .. v_length_of_target LOOP
column_to_left(j) := current_column(j);
END LOOP;
END LOOP;
END IF;
RETURN current_column(v_length_of_target);
END ld;
--
Jason Kohles
[EMAIL PROTECTED]
http://www.jasonkohles.com/
"A witty saying proves nothing." -- Voltaire
_______________________________________________
List: http://lists.rawmode.org/cgi-bin/mailman/listinfo/dbix-class
Wiki: http://dbix-class.shadowcatsystems.co.uk/
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/trunk/DBIx-Class/
Searchable Archive: http://www.mail-archive.com/[email protected]/