On Thu, 24 Aug 2017 07:40:21 BST roger peppe <[email protected]> wrote:
> On 24 August 2017 at 06:39, Bakul Shah <[email protected]> wrote:
> >
> > Finally, for better performance it may make sense to store the
> > FSM as a vector of vectors or vector of maps so that a slice
> > of inputs may be processed in one function call. Probably best
> > done with a FSM generator.
> 
> That's interesting. What might that look like?

Something like this:

type state struct {
        next    byte
        action  byte
}

type Token struct {
        text    []byte
        kind    byte
}

type Scanner struct {
        st      state
        parser  *Parser
        token   Token
        ...
}

const ( err byte = iota, skip, flush, unget, emit... )

var fsm [][]state
var class []byte

func init() {
        // initialize classs. e.g. letter for [A-Za-z]
        // create the FSM from a more compact spec
}

func (sc* Scanner) Scan(str []byte) {
        st := sc.state
        p := sc.parser
        i := 0
        start = 0
        for {
                if i >= len(str) { break }
                b := str[i]     // no unicode, just 8 bit chars!
                c := class[b]   // map char to a much smaller char-class
                st := fsm[st.next][c]
                i++
                switch st.action&7 {
                case err: // handle errors...
                case skip: continue
                case flush: start = i // e.g. at the end of a comment
                case unget: i--; fallthrough
                case emit:
                        sc.token = Token{str[start:i], st.action>>3}
                        start = i
                        p.Parse(sc.token)
                }
        }
        sc.state = st
}

Scan is called every time a line is read. When a full token is
recognized, the Parse is called. sc.action is composed of the
recognized terminal if any + next action.

If an extra char had to be read, the action is unget and
then call Parse. The parser in turn may trigger things
downstream when some sematic action has to be taken.

Scan can be called with the contents of a whole file or even a
single byte string. With this structure the mainloop can
poll/select on a number of input connections.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to