Il giorno 15/giu/05, alle 16:32, Tony Collen ha scritto:


I think what you're looking for is the Levenshtein Distance Algorithm. hl=en&q=java+Levenshtein+implementation&btnG=Google+Search

Nice! I also found an implementation nearby: StringUtils.html#getLevenshteinDistance(java.lang.String,%20java.lang.St ring)


However, this algorithm is useful for finding single-character differences, whereas I am more interested in word differences. IOW, the LD between "test" and "tent" is 1 and the LD between "test" and "barf" is 4, but for my purpose it should be 1 in both cases. And the LD between "test case" and "tent base" is smaller than the one between "test case" and "case under test", but I need it to be the reverse.

Actually, what I am trying to come up is an algorithm for determining whether two texts refer (more or less) about similar subjects.


Ugo Cei
Tech Blog:
Open Source Zone:
Wine & Food Blog:

Reply via email to