Could you setup the Lucene spell checker and use that? It has
pluggable distance measures, one being Edit distance. You might have
to implement your own variation to not do transposition.
On Jun 25, 2009, at 12:19 AM, prasenjit mukherjee wrote:
Gents,
Please accept my apologies if you think this may not be the correct
forum. I am trying to find a solution for approximate string
matching, where
I need to find all strings from a corpus which differs from a given
pattern
at most by "d" number of operations. And the allowed "d" operations
are
insertion, deletion, substitution. Yes I am not interested in
transposition
as it could be very expensive.
I looked into lingpipe they have a trie based solution in some class
called
Aproximate*Chunker*. Any body has any better approach ?
-Thanks,
Prasenjit
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search