Le 2007-08-13 à 21:27, Allan Odgaard a écrit :

On Aug 13, 2007, at 10:20 AM, Michel Fortin wrote:

Le 2007-08-12 à 23:23, Allan Odgaard a écrit :

I would have expected it to see first two back-ticks, then scan forward until another two back-ticks are seen (since the open- token defines the close-token) and thus give this output:

    <p>Backtick: <code>\</code>`</p>
[...]
[snip]

Regardless of how much look-ahead most parsers currently use, do you disagree with my interpretation?

I disagree.

If so, can provide a more formal definition of how you believe the spec should be read?

The code block start with a certain number of consecutive backticks and end with the same number of those backticks, an no more. This means that if you need to get a one-backtick code span you can write it this way: `` ` ``; and to get a five-backtick code span you can write this: ` ````` `. A space as the first or last character of the code span gets ignored.

Note that this way you don't need 11 backticks around a code span containing a run of 10 backticks somewhere in it. Your interpretation of the syntax would require that:

    (mine)   ` `````````` `
    (your's) ``````````` `````````` ```````````


Basically I read it as code-spans can be matched using this regexp: (`+) ?.*? ?\1

That's mostly it. For reference, this is the regex from PHP Markdown:

    {
        (?<!\\)     # Character before opening ` can't be a backslash
        (`+)        # $1 = Opening run of `
        (.+?)       # $2 = The code block
        (?<!`)
        \1          # Matching closer
        (?!`)
    }xs

It looks pretty much the same as yours, except there is a one- character look-ahead and another one-character look-behind around the closing run of backticks to ensure the marker is indeed the same length as the opening one. Also, leading and taling spaces are taken care of in the callback instead.

(There's also a check for a backslash at the start, although I just realised that this needs work as it doesn't give a correct result for an escaped litteral backslash like this: \\`code`.)

Although in practice we may 1) require at least one character inside the code-span (so `` on its own is not a zero-character code span)

Indeed, there needs to be at least one character inside a code span otherwise you wouldn't be able to differentiate the opening run of backticks from the closing one. If the sole character is a space character, it will get stripped so you can still make empty code spans.

and 2) we may want to limit them to “markdown paragraphs” which are roughly defined as ending when there are two consecutive newlines, making the pattern: (`+) ?(.|\n(?!\n))+? ?\1

Well, that rule will work in the general case, although if your paragraph is inside a blockquote it may become trickier:

    > Paragraph `code?
    >
    > Paragraph `end of code ?

This should result in no code span, although there is technically no completely blank line between the two. In your expression, I'd try replacing the pattern matching the blank line with something that can vary depending on the context; it's the only way it can scale to nested block elements, I think.

Also note that code spans are allowed in headers and list items (including those with span-level content) which have different block- ending rules:

    Header `code?
    -------------
    ### Header `end of code? ###
    Paragraph.

Those not-separated-by-a-blank-line headers are not really documented, and John has said he's considering getting rid of that which probably explains why they aren't. Perhaps they're not worth supporting (and this example is certainly ugly), but currently, using Markdown.pl and PHP Markdown, this will parse as you'd read it: two headers, one paragraph, no code span.

As for list items, I think this should constitute two list items, as current Markdown.pl and PHP Markdown do:

    *   List item `code?
    *   List item `end of code?

There's nothing explicit in the code about that, but I still think it makes sense. The logic being that while glancing at the document it looks like two list items, so it should really be two list items.

Going a little further, this one is trickier:

    *   List item `code?
        *   List item `end of code?

Markdown.pl gives completely bogus output while trying to create a sublist. PHP Markdown creates the sublist fine and no code span. I'm really not sure here whether creating a sublist or a code span is the best output. That said, it's certainly the very edge of an edge case. If we're to define a formal syntax, let's not start there.

Disclaimer: All this is only *my* interpretation of Markdown. If John Gruber decides otherwise, then I'll follow his interpretation instead.


Michel Fortin
[EMAIL PROTECTED]
http://www.michelf.com/


_______________________________________________
Markdown-Discuss mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Reply via email to