Oskar Sandberg <oskar at freenetproject.org> wrote:
> On Fri, Jun 22, 2001 at 03:52:06AM +0100, Theodore Hong wrote:
> > Rolling hash value?  I don't get what that has to do with splitting.
> 
> I was told that the correct way to parse a text for a stopper line (like
> you do when splitting a mine multipart) is to run a hash value over the
> text that works so that you pop in every new value, and then pop out the
> value at stopper.length back. When The rolling value matches the hash of
> the stopper with the same function, you have found the stop code.
> 
> I don't remember who told me this, I sort of thought it was you.

Nope, wasn't me.  Although now that I think about it, the way I would do it
is to use the Boyer-Moore string searching algorithm.  (Not that I actually
did it this way last night, but now I think I ought to.)   You precompute a
table of the locations of characters in your stopper line.  Then (this is
the clever part) instead of comparing forward byte-by-byte, you look ahead
stopper.length bytes and compare backward.  If the first character you read
is not in stopper at all, you know right away that stopper cannot begin in
the next stopper.length bytes, so you can jump right over that whole range
without doing any more compares.  If it's in the middle of stopper
somewhere, you use your precomputed table of locations to determine the set
of places where stopper could begin, and jump ahead to the first one.  If
it's at the end of stopper, in the worst case you compare backwards
stopper.length bytes to confirm that you found it, which is the same number
of compares as searching forwards.

theo

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 174 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20010622/4e46ce7b/attachment.pgp>

Reply via email to