Il giorno 15/giu/05, alle 16:32, Tony Collen ha scritto:
Ugo,
I think what you're looking for is the Levenshtein Distance Algorithm.
http://www.google.com/search?
hl=en&q=java+Levenshtein+implementation&btnG=Google+Search
Nice! I also found an implementation nearby:
http://jakarta.apache.org/commons/lang/api/org/apache/commons/lang/
StringUtils.html#getLevenshteinDistance(java.lang.String,%20java.lang.St
ring)
;)
However, this algorithm is useful for finding single-character
differences, whereas I am more interested in word differences. IOW, the
LD between "test" and "tent" is 1 and the LD between "test" and "barf"
is 4, but for my purpose it should be 1 in both cases. And the LD
between "test case" and "tent base" is smaller than the one between
"test case" and "case under test", but I need it to be the reverse.
Actually, what I am trying to come up is an algorithm for determining
whether two texts refer (more or less) about similar subjects.
Ugo
--
Ugo Cei
Tech Blog: http://agylen.com/
Open Source Zone: http://oszone.org/
Wine & Food Blog: http://www.divinocibo.it/