Re: Markdown Extra Spec: Parsing Section
If we're going this way, there's going to be a learning curve: for me, and for everyone trying to understand the syntax. I'd prefer to avoid forcing people to learn a new language only to understand the specification. PS. Here's all you have to learn in order to write or read a PEG grammar. A B C A followed by B followed by C A | B A or B (ordered choice) A+one or more As A*zero or more As A?optional A !Anot followed by A Afollowed by A (but does not consume A) (A B) grouping . matches any character 'x' matches the character 'x' string matches the string string [a-z] matches a character from 'a' to 'z' English could be used to specify how a semantic value is to be constructed for each matching rule. This part would be implemented differently in different languages, but the basic PEG grammar would be the same. John ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Markdown Extra Spec: Parsing Section
Le 2008-05-13 à 2:20, John MacFarlane a écrit : PS. Here's all you have to learn in order to write or read a PEG grammar. A B C A followed by B followed by C A | B A or B (ordered choice) A+one or more As A*zero or more As A?optional A !Anot followed by A Afollowed by A (but does not consume A) (A B) grouping . matches any character 'x' matches the character 'x' string matches the string string [a-z] matches a character from 'a' to 'z' It certainly true that many parts could be converted to this and be less verbose, and I find this idea appealing. I doubt the whole Markdown Extra ruleset can be expressed in this format though. Can a PEG grammar have parametrized rules? I've just added nested block element support in the spec. This is done by having the block element generator (formerly the block element pass) have a stack of rules to match when starting each line. This idea coming straight from Allan Odgaard's explanation of his lost Markdown parser. http://six.pairlist.net/pipermail/markdown-discuss/2008-March/001107.html Michel Fortin [EMAIL PROTECTED] http://michelf.com/ ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Markdown Extra Spec: Parsing Section
It certainly true that many parts could be converted to this and be less verbose, and I find this idea appealing. I doubt the whole Markdown Extra ruleset can be expressed in this format though. Can a PEG grammar have parametrized rules? I've just added nested block element support in the spec. This is done by having the block element generator (formerly the block element pass) have a stack of rules to match when starting each line. This idea coming straight from Allan Odgaard's explanation of his lost Markdown parser. http://six.pairlist.net/pipermail/markdown-discuss/2008-March/001107.html No, PEG can't do this. But there is a different approach that works (described in my earlier email). By the way: if I understand it correctly, your description of Code block would parse the following as two code blocks, not one code block containing a blank line: some code more code (Note: there is no tab on the middle line.) I don't think that's the desired behavior. Here's the markdown-peg version (and remember, this is runnable): verbatim - newRule $ many1 (doesNotMatch blankline - indentedLine) ++ (many (many1 (optional indent - blankline) ++ many1 (doesNotMatch blankline - indentedLine)) ## concat) - many blankline ## Verbatim . concat John ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Markdown Extra Spec: Parsing Section
Le 2008-05-12 à 21:55, John MacFarlane a écrit : I am assuming that there will be a different type to deal with link text. There will. Is there any good reason for having two different types here? The link text can contain other span-level elements, such as emphasis, code blocks, etc. This *has* to be taken into account while parsing. On the other hand, text in the reference part is just plain text. As far as I can see, allowing anything that can serve as link text to be a refname would not contradict anything in the official Markdown syntax specification. In addition, it is hard to imagine a realistic case where allowing brackets and newlines in refnames would break an existing document. Why make users remember extra restrictions? (I didn't even know about them until a few days ago, and I've used markdown for years.) And why expose users to the risk that their documents will break if they hard-wrap a long refname? I'm in favor of allowing hard-wrapped reference names where the line break is not significant, so that will probably end up in the spec when I write the part about parsing the link span element. Please keep in mind that the current refname construct is for the reference name in link definitions, and may be different from the one used in the link span element. I think the current behavior of phpmarkdown and Markdown.pl is very confusing. This produces a link: [[hi]][] [[hi]]: /url But this doesn't produce a link: [hello][[hi]] [[hi]]: /url So either (a) not all link references begin with a refname, or (b) refnames can sometimes (but not always!) contain embedded brackets. Either option would conflict with Michel's syntax specification as it now stands. This situation is indeed inconsistant. I'd be in favor of allowing balanced square brakets in link reference, even though John Gruber seems (or seemed in 2006) to think they should be disallowed completely. http://six.pairlist.net/pipermail/markdown-discuss/2006-September/000257.html Michel Fortin [EMAIL PROTECTED] http://michelf.com/ ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Markdown Extra Spec: Parsing Section
Le 2008-05-12 à 18:14, John MacFarlane a écrit : The PEG representation is concise, precise, and readable. Readable, hum... if I look at this rule from PEG Markdown: ListContinuationBlock = a:StartList ( BlankLines { if (strlen($$.contents.str) == 0) $$.contents.str = strdup(\001); /* block separator */ pushelt($$, a); } ) ( Indent ListBlock { pushelt($$, a); } )+ { $$ = mk_str(concat_string_list(reverse(a.children))); } it looks a lot like code to me, half of it I don't understand. If we're going this way, there's going to be a learning curve: for me, and for everyone trying to understand the syntax. I'd prefer to avoid forcing people to learn a new language only to understand the specification. Michel Fortin [EMAIL PROTECTED] http://michelf.com/ ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Markdown Extra Spec: Parsing Section
Michel Fortin wrote: Anyway, with the parsing model in three passes I'm currently defining it's pretty trivial to do correctly: the block elements pass extracts the text of the blockquote, leaving this to parse by the span element pass: what about this http:// google.com/ case? The span element pass would then see an autolink and just ignore any newline it finds in the URL. Ah, okay. Somehow I misread that. Yes, that seems about right. -Jacob ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Markdown Extra Spec: Parsing Section
Michel Fortin wrote: Le 2008-05-12 à 18:14, John MacFarlane a écrit : The PEG representation is concise, precise, and readable. Readable, hum... if I look at this rule from PEG Markdown: ListContinuationBlock = a:StartList ( BlankLines { if (strlen($$.contents.str) == 0) $$.contents.str = strdup(\001); /* block separator */ pushelt($$, a); } ) ( Indent ListBlock { pushelt($$, a); } )+ { $$ = mk_str(concat_string_list(reverse(a.children))); } it looks a lot like code to me, half of it I don't understand. If we're going this way, there's going to be a learning curve: for me, and for everyone trying to understand the syntax. I'd prefer to avoid forcing people to learn a new language only to understand the specification. Yeah, that's worse. Mainly I just would suggest taking all those numbered lists of things, and putting them on a single line. It's not that it has to be BNF or EBNF/ABNF/whatever, but parts which *can* be expressed in such a way, and can be condensed to fit in a more compact space, should be. The current numbered lists + English approach, in many parts of your current work, just add visual clutter. :) -Jacob ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Markdown Extra Spec: Parsing Section
Michel Fortin wrote: I've began writing the parsing section of the spec, and I though I'd let you know about where I'm heading with all this. You should write it in something closer to a BNF-like format. The current version is about 10x more verbose than necessary, and it makes reading the spec considerably more difficult. ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Markdown Extra Spec: Parsing Section
Michel Fortin wrote: I've began writing the parsing section of the spec, and I though I'd let you know about where I'm heading with all this. Also, you're still going to have quite a few sticky edge cases with your current parsing model. What happens when we have a ``-delimited URL inside a blockquote? For instance: what about this http:// google.com/ case? -Jacob ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Markdown Extra Spec: Parsing Section
Le 2008-05-11 à 20:55, Jacob Rus a écrit : You should write it in something closer to a BNF-like format. The current version is about 10x more verbose than necessary, and it makes reading the spec considerably more difficult. The reason I'm doing it like this is that I doubt everything will be expressible in a BNF format. Using plain english descriptions allows me to not bother about fitting things to a specific grammar and just write what I feel is the most natural and the easier to understand. Shopping for a more formal and less verbose grammar, if we need one, will be much easier once we know what we need, once we can compare existing grammars against a checklist of what is necessary to implement the given parsing algorithm. If you remember the timetable I've given, you'll see that I've booked about half a year for polishing things out. This includes rephrasing sentences, refactorizing the syntax, and reformatting the spec to make it easier to understand. This *could* include switching to a new grammar format if it makes things more intuitive and readable. Also, you're still going to have quite a few sticky edge cases with your current parsing model. What happens when we have a ``- delimited URL inside a blockquote? For instance: what about this http:// google.com/ case? Well, currently newlines aren't allowed inside automatic links in Markdown.pl, PHP Markdown and some others. Implementations who see an automatic link there sees it as a link to http:// google.com/ (notice the space) or http://; (notice what's missing). http://babelmark.bobtfish.net/?markdown=%0D%0A%3E+what+about+this+%3Chttp%3A%2F%2F%0D%0A%3E+google.com%2F%3E+case%3Fnormalize=onsrc=1dest=2 Anyway, with the parsing model in three passes I'm currently defining it's pretty trivial to do correctly: the block elements pass extracts the text of the blockquote, leaving this to parse by the span element pass: what about this http:// google.com/ case? The span element pass would then see an autolink and just ignore any newline it finds in the URL. Michel Fortin [EMAIL PROTECTED] http://michelf.com/ ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Markdown Extra Spec: Parsing Section
Michel, I think there's a problem with: refname A run of one or more characters, excluding any newline and U+005D Closing Square Bracket. This doesn't allow refnames with embedded brackets. But PHP Markdown allows [[hi]](/url) as a valid link. Also, PHP Markdown currently allows embedded newlines, which are excluded by your definition: [hi there](/url) Of course, embedded *blank* lines should be excluded. John +++ Michel Fortin [May 08 08 23:59 ]: Hello all, I've began writing the parsing section of the spec, and I though I'd let you know about where I'm heading with all this. Basically, parsing is defined as three consecutive passes: parsing document elements, parsing block elements and parsing span elements. Each pass is going to contain a set of rules the parser should attempt to match while parsing the input. Rules are expressed in English, but are highly structured so that it should be pretty straightforward to convert to a formal grammar if the grammar is powerful enough to express them. I'm not saying too much here; elaborate explanations are better in the spec than in this volatile email. If you're interested, take a look and tell me what you think: http://michelf.com/specs/markdown-extra/#parsing Michel Fortin [EMAIL PROTECTED] http://michelf.com/ ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Markdown Extra Spec: Parsing Section
could be converted into a reference-style link like this: [[link with embedded brackets]] I don't believe Markdown allows for links to be defined like this. A reference style link would be defined as: [[link with embedded brackets]][link with embedded brackets] or [link with embedded brackets][] Newer versions of Markdown.pl and PHP Markdown do allow the trailing '[]' to be omitted. % Markdown.pl --version This is Markdown, version 1.0.2b8. Copyright 2004 John Gruber http://daringfireball.net/projects/markdown/ % Markdown.pl [hi] [hi]: /url pa href=/urlhi/a/p I agree that without this style of reference links, it would make more sense to distinguish between link text and 'refnames'. But this style of link gives a reason not to distinguish them. John ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Markdown Extra Spec: Parsing Section
Just to clarify: Most implementations allow newlines and embedded brackets in link text. But pandoc and peg-markdown seem to be the only ones that currently allow them in link reference definitions: http://babelmark.bobtfish.net/?markdown=[hi%0D%0Athere][]%0D%0A%0D%0A[hi+there]%3A+%2Furl%0D%0A%0D%0A[hi%0D%0Aagain][]%0D%0A%0D%0A[hi%0D%0Aagain]%3A+%2Furl%0D%0A%0D%0A[[hi]](%2Furl)%0D%0A%0D%0A[[hello]]%0D%0A%0D%0A[[hello]]%3A+%2Furl%0D%0Anormalize=onsrc=1dest=2 +++ Tomas Doran [May 09 08 21:19 ]: Still, why shouldn't refnames be allowed to have embedded brackets and newlines, if explicit links can? To me those seem to be two entirely different things... The _text_ of the link has to be flexible to allow almost anything. refnames, on the other hand are identifiers and as such it makes sense for them to be more constrained. For instance, if you allow new lines in them it opens a whole bunch of questions as to what white space counts. In Text::Markdown, I'm allowing new lines in link text: http://svn.kulp.ch/cpan/text_multimarkdown/trunk/t/Text- Markdown.mdtest/Links_multiline_bugs_1.text http://svn.kulp.ch/cpan/text_multimarkdown/trunk/t/Text- Markdown.mdtest/Links_multiline_bugs_2.text As it was specifically reported to me as a bug: http://bugs.debian.org/459885 Cheers Tom ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Markdown Extra Spec: Parsing Section
I've clarified and changed a few things about some parsing rules and started defining new rules for the block elements pass. Of notice is the flat code block in the block elements pass, which is is going to be part of the next version of PHP Markdown Extra. Michel Fortin [EMAIL PROTECTED] http://michelf.com/ ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Markdown Extra Spec: Parsing Section
On Thu, May 8, 2008 at 8:59 PM, Michel Fortin [EMAIL PROTECTED] wrote: Basically, parsing is defined as three consecutive passes: parsing document elements, parsing block elements and parsing span elements. Looks good so far. The most delicate part is still to come (defining indentation for lists, and (X)(HT)ML fragments in the text flow). -- Andrea Censi PhD student, Control Dynamical Systems, Caltech http://www.cds.caltech.edu/~andrea/ Life is too important to be taken seriously (Oscar Wilde) ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss