Could you setup the Lucene spell checker and use that? It has pluggable distance measures, one being Edit distance. You might have to implement your own variation to not do transposition.

On Jun 25, 2009, at 12:19 AM, prasenjit mukherjee wrote:

Gents,
   Please accept my apologies if you think this may not be the correct
forum. I am trying to find a solution for approximate string matching, where I need to find all strings from a corpus which differs from a given pattern at most by "d" number of operations. And the allowed "d" operations are insertion, deletion, substitution. Yes I am not interested in transposition
as it could be very expensive.

I looked into lingpipe they have a trie based solution in some class called
Aproximate*Chunker*. Any body has any better approach ?

-Thanks,
Prasenjit

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to