Ole, thank you for telling me about DFA machines. It's much more than I
knew. I think it might be fun to implement with the model I'm using.
The hardest thing for me was implementing quantifiers. What I'm doing
with search-text.r is "compiling" an expression like
[bound "t" 3 7 alpha "ing" bound]
to a form like:
[bound "t" [4 6 3 7] make bitset! #{...} [3] "ing" bound]
in which the blocks are nodes which tell:
one or two elements to jump to
the minimum number of times the first jump must be made
the maximum number of times the first jump can be made
So we have to go through the cycle of elements 3/4/5
-------------------------------
| |
V |
"t"(2) -> node(3) -> bitset(4) -> node(5) -
|
-------------------------------------> "ing"(6)
three times before we even think about jumping to the "ing" element.
What I'm doing is saving the number of times node(3) has been passed
in another block in a position tied to the "ing" element. When there
is a failure to match somewhere in the cycle, or when the cycle has
been done the maximum number of times, it's important to erase the
information on how many times node(3) has been passed, in case one loop
is nested inside another loop and there's a chance of coming back to
it.
Now with the NFA engine, saving a state means you only have to push
onto a stack the index of a jump you could have taken, plus the current
index of the text string. But with a DFA engine, you wouldn't need the
text string index (because once you deal with a character in the text
stream it's done once and for all) - is that right? On the other hand,
every time you come to a place where two jumps are possible you have
to start another thread so that both possible pattern indexes can be
compared against the next character in the text stream.
What seems especially complicated here is that you have to save the
information on which node has been passed how many times with each
thread. Complicated yes, but once a thread dies, it should be easier
to delete all the information associated with it.
Well, that's the idea anyway. If you would take a look at the code and
tell me any ideas you have, or if you could suggest a reference on
DFA machines, I'd really appreciate it.
Talk to you later,
Eric
Thanks,
Eric