Michel Fortin wrote:
As to how to parse it with an incremental parser, I assume you could do that:

    text: this
    mark: **
    text: is
    mark: `
    (switch tokenizer into "raw" mode until it sees a backtick)
    text: raw** text
    mark: `
    (take last text token, remove backtick marks, and make a code span)
    (switch back tokenizer into "span" mode)
    end reached in span

The hard part comes when no matching backtick is found (assuming non-paired backticks do not constitute code). Here's what I suggest for the same case with no ending backtick:

    text: this
    mark: **
    text: is
    mark: `
    (switch tokenizer into "raw" mode until it sees a backtick)
    text: raw** text
    end reached in raw
      (reparse last text token in "span" mode)
        text: raw
        mark: **
(take tokens between the two ** marks and put them in emphasis, the two marks are removed)
        text: text
        end

Note that in this case backtracking is limited to the last token, which is itself limited in length by the current block (paragraph, list item, ...). I have no idea how that could fit any formal grammar language however.

Well - has anyone else looked into ANTLR 3.0 at all? The LL(*) grammar language it uses (an EBNF) allows for full backtracking support, and unspecified lookahead as far as necessary. It's fairly well-optimized, as I understand it, taking advantage of some of the packrat-parsing ideas to save handling a single text section repeatedly...

I suspect Markdown might be formally specifiable in ANTLR v3, and I'd bet that even if it's not, it's very close. If it is - getting Markdown parsers into various languages would just be a matter of helping develop new ANTLR v3 language-translation backends.

- Eric Astor
_______________________________________________
Markdown-Discuss mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Reply via email to