Il giorno 15/giu/05, alle 16:32, Tony Collen ha scritto:

Ugo,

I think what you're looking for is the Levenshtein Distance Algorithm.

http://www.google.com/search? hl=en&q=java+Levenshtein+implementation&btnG=Google+Search

Nice! I also found an implementation nearby:

http://jakarta.apache.org/commons/lang/api/org/apache/commons/lang/ StringUtils.html#getLevenshteinDistance(java.lang.String,%20java.lang.St ring)

;)

However, this algorithm is useful for finding single-character differences, whereas I am more interested in word differences. IOW, the LD between "test" and "tent" is 1 and the LD between "test" and "barf" is 4, but for my purpose it should be 1 in both cases. And the LD between "test case" and "tent base" is smaller than the one between "test case" and "case under test", but I need it to be the reverse.

Actually, what I am trying to come up is an algorithm for determining whether two texts refer (more or less) about similar subjects.

        Ugo

--
Ugo Cei
Tech Blog: http://agylen.com/
Open Source Zone: http://oszone.org/
Wine & Food Blog: http://www.divinocibo.it/

Reply via email to