Edi Weitz <[EMAIL PROTECTED]> writes:

> I was aware of Boyer-Moore although I'm not really familiar with
> it. I'll try to implement it this weekend and let the list know
> whether there's a significant difference. (I guess you first
> have to check the size of the strings in order to decide if it
> is worth building the translation tables.)

Below is my implementation of Boyer-Moore-Horspool algorithm. It
should be better than Boyer-Moore in practise.

If you really need speed, there is algorithm called
Boyer-Moore-Horspool-Hume-Sunday. It is 20% better than
Boyer-Moore-Horspool in practice.

;;;
;;; Boyer-Moore-Horspool
;;;
;;; m - pattern size
;;; n - text size
;;;
;;; Worst case time: O(mn)
;;; Average time: O(n/m)
;;; In practice better than Boyer-Moore and Knuth-Morris-Pratt
;;;  (when number of characters & m are not wery small)
;;;

(defun bmh-search (pattern text)
  (declare (optimize (speed 3) (safety 0) (debug 0)
                     (compilation-speed 0) (space 0))
           (type simple-base-string pattern text))
  (check-type pattern simple-base-string)
  (check-type text simple-base-string)
  (let* ((n (length text))
         (m (length pattern))
         (skip (make-array char-code-limit
                          :element-type 'fixnum
                          :initial-element m)))
    (declare (fixnum n m))
    (when (= 0 m)
        (return-from bmh-search 0))
    (loop for k fixnum below m do
          (setf (aref skip (char-code (aref pattern k))) (- m k 1)))
    (loop for k fixnum = (1- m) then
            (setf k (+ k (max 1 (aref skip (char-code (aref text k))))))
          while (< k n)
          do
          (loop for j fixnum downfrom (1- m)
                for i fixnum downfrom k
                while (and (<= 0 j)
                           (char= (aref text i))
                                  (aref pattern j))
                finally (if (= j -1)
                             (return-from bmh-search (1+ i)))))))

;;;; eof

 Kimmo
-- 
((lambda (integer) ;; http://www.iki.fi/kt/
   (coerce (loop for i upfrom 0 by 8 below (integer-length integer)
                 collect (code-char (ldb (byte 8 i) integer))) 'string))
 100291759904362517251920937783274743691485481194069255743433035)


Reply via email to