Here is a fairly easy way to search in a body of text for a string when 
there might be single-character errors in a pattern, like searching for 
"butterfly" but the text has "budterfly".  The paper, which I found some 
years ago, dates back to I think the early 1990s.  It has a basic method 
with a series of potential refinements.  The basic method is well suited 
for programming in Python.  I'm not sure that the refinements would work 
well, because they involve tinkering with the hash table design, and 
keeping the hash tables small enough that they can stay in cache memory - 
not what interpreted language are especially good for.

Actually, the paper covers searching for multiple patterns at one time, 
i.e., in a single pass.  But you can easily do it for just one pattern if 
that's what you want.

I thought this might be of interest to some people on the list.  Here's a 
link to the paper:

APPROXIMATE MULTIPLE STRING SEARCH (Muth and Manber) 
<http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=F5D9579C47892924DAF93B36A9445424?doi=10.1.1.21.3317&rep=rep1&type=pdf>

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/4d0023b1-244d-44c7-9059-41f26fe4cf40n%40googlegroups.com.

Reply via email to