OK. Given the previously stated instruction set, it's straightforward to
take a conventional RE expressed as a string and convert it (bottom up)
into an opcode sequence. The opcode sequence can then be interpreted
directly. If we have more than one RE - or more precisely, REs that return
distinct token IDs - we need to do DFA conversion to ensure that there is
no collision in which final states might return different IDs at the same
priority.
So if this is the opcode set:
SPLIT *pc1, pc2 *Initiates non-deterministic parallel execution.
If present, this is an NFA.
JMP *pc* Redirects execution to some other instruction
BYTE *b* *pc* If the input pointer points to the byte value *b*,
advance to the given *pc*, else go the the next
instruction (this replaces CHAR).
TOKEN *prio*, *id* Indicates that we currently have a match for a
token *id* with priority *prio.* The current input
position corresponds to the end of the match.
If a higher priority token result already exists,
this opcode has no effect
STOP Halts the current thread.
STOPALL Halts ALL threads
For DFA conversion, we model every instruction as an NFA state and build
the DFA more or less according to the classic algorithm. For this purpose:
epsilon(*pc = SPLIT *pc1 pc2*) = { *pc1, pc2* }
epsilon(*pc = JMP *pcdest*) = { *pcdest* }
epsilon(*pc = BYTE *b* *pcmatch*) = { *pc + 1* }
epsilon(*pc = TOKEN *prio id*) = { pc + 1 }
epsilon(*pc = STOP) = { }
epsilon(*pc = STOPALL) = { }
So where the classic implementation has each DFA state as a set of NFA
states, this one has each DFA state as a set of PCs.
The definition of *epsilon* for the BYTE opcode leads me to notice a
problem: it is possible to have an NFA containing two matches for the same
character. This is not desirable, and it leads me to wonder if I shouldn't
revert back to ONMATCH and institute a rule that sequential occurrences of
BYTE are treated as a set and must be disjoint.
shap
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev