One alternative is the bitap algorithm, you give it two strings and the amount 
of errors you accept. It creates a bit pattern which it then uses to scan 
through the text pretty fast. Here is a sample implementation adapted from 
[here](https://www.programmingalgorithms.com/algorithm/fuzzy-bitap-algorithm/c/):
    
    
    import sequtils
    
    proc searchString(text: string, pattern: string, k: int): int =
      result = -1
      if pattern.len == 0: return 0
      if pattern.len > 63: return -1 # The pattern is too long?
      
      var
        R = newSeqWith[uint64](k + 1, not 1.uint64)
        patternMask: array[char.high.int + 1, uint64]
      
      for i in 0..char.high.int:
        patternMask[i] = not 0.uint64
      
      for i in 0..<pattern.len:
        patternMask[pattern[i].int] = patternMask[pattern[i].int] and not 
(1.uint64 shl i)
      
      for i in 0..text.high:
        var oldRd1 = R[0]
        
        R[0] = R[0] or patternMask[text[i].int]
        R[0] = R[0] shl 1
        
        for d in 1..k:
          let tmp = R[d]
          R[d] = (oldRd1 and (R[d] or patternMask[text[i].int])) shl 1
          oldRd1 = tmp
        
        if 0 == (R[k] and (1.uint64 shl pattern.len)):
          result = (i - pattern.len) + 1
          break
    
    echo searchString("The quick brown foax jumps over the lazy dog", "fox", 1)
    echo searchString("world hello", "hllo", 1)
    
    
    Run

Pros: It uses bitwise operations so it should be pretty fast

Cons: Doesn't index anything about the document, so it still needs to search 
through the text each time Requires you to set the "k" factor, or how many 
errors you accept. This means that if you want to start strict and ease up the 
k factor as you go along you might need to run many iterations. This can be 
somewhat alleviated by doing multiple scans at the same time or similar tricks 
to the algorithm.

Reply via email to