Michel Fortin wrote:
As to how to parse it with an incremental parser, I assume you could do that:text: this mark: ** text: is mark: ` (switch tokenizer into "raw" mode until it sees a backtick) text: raw** text mark: ` (take last text token, remove backtick marks, and make a code span) (switch back tokenizer into "span" mode) end reached in spanThe hard part comes when no matching backtick is found (assuming non-paired backticks do not constitute code). Here's what I suggest for the same case with no ending backtick:text: this mark: ** text: is mark: ` (switch tokenizer into "raw" mode until it sees a backtick) text: raw** text end reached in raw (reparse last text token in "span" mode) text: raw mark: **(take tokens between the two ** marks and put them in emphasis, the two marks are removed)text: text endNote that in this case backtracking is limited to the last token, which is itself limited in length by the current block (paragraph, list item, ...). I have no idea how that could fit any formal grammar language however.
Well - has anyone else looked into ANTLR 3.0 at all? The LL(*) grammar language it uses (an EBNF) allows for full backtracking support, and unspecified lookahead as far as necessary. It's fairly well-optimized, as I understand it, taking advantage of some of the packrat-parsing ideas to save handling a single text section repeatedly...
I suspect Markdown might be formally specifiable in ANTLR v3, and I'd bet that even if it's not, it's very close. If it is - getting Markdown parsers into various languages would just be a matter of helping develop new ANTLR v3 language-translation backends.
- Eric Astor _______________________________________________ Markdown-Discuss mailing list [email protected] http://six.pairlist.net/mailman/listinfo/markdown-discuss
