Le 2007-08-12 à 23:23, Allan Odgaard a écrit :

I would have expected it to see first two back-ticks, then scan forward until another two back-ticks are seen (since the open-token defines the close-token) and thus give this output:

    <p>Backtick: <code>\</code>`</p>

I know most Markdown parsers do not follow conventional parser wisdom, but IMO this is also the interpretation that suits an incremental tokenizer/parser best compared to your interpretation, which requires a look-ahead to potentially the end of the document, each time one or more back-ticks are seen.

The look-ahead is until the end of the paragraph, not the end of the document; at least it is in PHP Markdown and Markdown.pl (haven't tested others) which first break the content into paragraphs, then apply span-level rules inside them.

There is a lot of look-aheads in Markdown: emphasis won't be applied if asterisks or underscores can't be matched in pairs; links won't be links if there's no suitable parenthesis after the closing bracket, Setext-style headers need the line of hyphens or equal signs following its content, the parsing mode for list items depends on whether or not it contains a blank line, etc.

There's no way to do a truly incremental parsing of Markdown... well, you could in a way, but you'd have to mutate many parts of the output document while parsing (like HTML parsers do in browsers), or to delay the output of ambigus parts until the end of the document; all this surely defeats the purpose of an incremental parser. The worst "look-ahead" (or most complex "mutations") would be for reference- style links which can have their definitions absolutely anywhere in the document. Interestingly, that's probably one of the most appreciated features of Markdown.


Michel Fortin
[EMAIL PROTECTED]
http://www.michelf.com/


_______________________________________________
Markdown-Discuss mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Reply via email to