Re: Markdown Extra Spec: Parsing Section

2008-05-13 Thread John MacFarlane
 If we're going this way, there's going to be a learning curve: for
 me, and for everyone trying to understand the syntax. I'd prefer to
 avoid forcing people to learn a new language only to understand the
 specification.

PS. Here's all you have to learn in order to write or read a PEG grammar.

A B C A followed by B followed by C
A | B A or B (ordered choice)
A+one or more As
A*zero or more As
A?optional A
!Anot followed by A
Afollowed by A (but does not consume A)
(A B) grouping
. matches any character
'x'   matches the character 'x'
string  matches the string string
[a-z] matches a character from 'a' to 'z'

English could be used to specify how a semantic value is to be
constructed for each matching rule. This part would be implemented
differently in different languages, but the basic PEG grammar would be
the same.

John

___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown Extra Spec: Parsing Section

2008-05-13 Thread Michel Fortin

Le 2008-05-13 à 2:20, John MacFarlane a écrit :

PS. Here's all you have to learn in order to write or read a PEG  
grammar.


A B C A followed by B followed by C
A | B A or B (ordered choice)
A+one or more As
A*zero or more As
A?optional A
!Anot followed by A
Afollowed by A (but does not consume A)
(A B) grouping
. matches any character
'x'   matches the character 'x'
string  matches the string string
[a-z] matches a character from 'a' to 'z'


It certainly true that many parts could be converted to this and be  
less verbose, and I find this idea appealing. I doubt the whole  
Markdown Extra ruleset can be expressed in this format though. Can a  
PEG grammar have parametrized rules?


I've just added nested block element support in the spec. This is done  
by having the block element generator (formerly the block element  
pass) have a stack of rules to match when starting each line. This  
idea coming straight from Allan Odgaard's explanation of his lost  
Markdown parser.
http://six.pairlist.net/pipermail/markdown-discuss/2008-March/001107.html 




Michel Fortin
[EMAIL PROTECTED]
http://michelf.com/


___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown Extra Spec: Parsing Section

2008-05-13 Thread John MacFarlane
 It certainly true that many parts could be converted to this and be less 
 verbose, and I find this idea appealing. I doubt the whole Markdown Extra 
 ruleset can be expressed in this format though. Can a PEG grammar have 
 parametrized rules?

 I've just added nested block element support in the spec. This is done  
 by having the block element generator (formerly the block element pass) 
 have a stack of rules to match when starting each line. This idea coming 
 straight from Allan Odgaard's explanation of his lost Markdown parser.
 http://six.pairlist.net/pipermail/markdown-discuss/2008-March/001107.html 

No, PEG can't do this. But there is a different approach that works
(described in my earlier email).

By the way: if I understand it correctly, your description of Code
block would parse the following as two code blocks, not one code block
containing a blank line:

some code

more code

(Note: there is no tab on the middle line.)  I don't think that's the
desired behavior.

Here's the markdown-peg version (and remember, this is runnable):

verbatim - newRule $ 
   many1 (doesNotMatch blankline - indentedLine) ++ 
   (many (many1 (optional indent - blankline) ++ 
  many1 (doesNotMatch blankline - indentedLine)) ## concat) - 
   many blankline ## Verbatim . concat

John

___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown Extra Spec: Parsing Section

2008-05-12 Thread Michel Fortin

Le 2008-05-12 à 21:55, John MacFarlane a écrit :


I am assuming that there will be a different type to deal with link
text.


There will.


Is there any good reason for having two different types here?


The link text can contain other span-level elements, such as emphasis,  
code blocks, etc. This *has* to be taken into account while parsing.  
On the other hand, text in the reference part is just plain text.




As far as
I can see, allowing anything that can serve as link text to be a  
refname

would not contradict anything in the official Markdown syntax
specification. In addition, it is hard to imagine a realistic case  
where

allowing brackets and newlines in refnames would break an existing
document. Why make users remember extra restrictions? (I didn't even
know about them until a few days ago, and I've used markdown for  
years.)
And why expose users to the risk that their documents will break if  
they

hard-wrap a long refname?


I'm in favor of allowing hard-wrapped reference names where the line  
break is not significant, so that will probably end up in the spec  
when I write the part about parsing the link span element.


Please keep in mind that the current refname construct is for the  
reference name in link definitions, and may be different from the one  
used in the link span element.




I think the current behavior of phpmarkdown and Markdown.pl is very
confusing. This produces a link:

   [[hi]][]

   [[hi]]: /url

But this doesn't produce a link:

   [hello][[hi]]

   [[hi]]: /url

So either (a) not all link references begin with a refname, or (b)
refnames can sometimes (but not always!) contain embedded brackets.
Either option would conflict with Michel's syntax specification
as it now stands.



This situation is indeed inconsistant. I'd be in favor of allowing  
balanced square brakets in link reference, even though John Gruber  
seems (or seemed in 2006) to think they should be disallowed completely.
http://six.pairlist.net/pipermail/markdown-discuss/2006-September/000257.html 




Michel Fortin
[EMAIL PROTECTED]
http://michelf.com/


___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown Extra Spec: Parsing Section

2008-05-12 Thread Michel Fortin

Le 2008-05-12 à 18:14, John MacFarlane a écrit :


The PEG representation is concise, precise, and readable.


Readable, hum... if I look at this rule from PEG Markdown:

ListContinuationBlock = a:StartList
( BlankLines
{ if (strlen($$.contents.str) == 0)
$$.contents.str = strdup(\001); /* block separator */
pushelt($$, a); } )
( Indent ListBlock { pushelt($$, a); } )+
{ $$ = mk_str(concat_string_list(reverse(a.children))); }

it looks a lot like code to me, half of it I don't understand. If  
we're going this way, there's going to be a learning curve: for me,  
and for everyone trying to understand the syntax. I'd prefer to avoid  
forcing people to learn a new language only to understand the  
specification.



Michel Fortin
[EMAIL PROTECTED]
http://michelf.com/


___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown Extra Spec: Parsing Section

2008-05-12 Thread Jacob Rus

Michel Fortin wrote:
Anyway, with the parsing model in three passes I'm currently defining 
it's pretty trivial to do correctly: the block elements pass extracts 
the text of the blockquote, leaving this to parse by the span element pass:


what about this http://
google.com/ case?

The span element pass would then see an autolink and just ignore any 
newline it finds in the URL.


Ah, okay.  Somehow I misread that.  Yes, that seems about right.

-Jacob

___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown Extra Spec: Parsing Section

2008-05-12 Thread Jacob Rus

Michel Fortin wrote:

Le 2008-05-12 à 18:14, John MacFarlane a écrit :


The PEG representation is concise, precise, and readable.


Readable, hum... if I look at this rule from PEG Markdown:

ListContinuationBlock = a:StartList
( BlankLines
{ if (strlen($$.contents.str) == 0)
$$.contents.str = strdup(\001); /* block separator */
pushelt($$, a); } )
( Indent ListBlock { pushelt($$, a); } )+
{ $$ = mk_str(concat_string_list(reverse(a.children))); }

it looks a lot like code to me, half of it I don't understand. If we're 
going this way, there's going to be a learning curve: for me, and for 
everyone trying to understand the syntax. I'd prefer to avoid forcing 
people to learn a new language only to understand the specification.


Yeah, that's worse.

Mainly I just would suggest taking all those numbered lists of things, 
and putting them on a single line.  It's not that it has to be BNF or 
EBNF/ABNF/whatever, but parts which *can* be expressed in such a way, 
and can be condensed to fit in a more compact space, should be.  The 
current numbered lists + English approach, in many parts of your current 
work, just add visual clutter. :)


-Jacob

___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown Extra Spec: Parsing Section

2008-05-11 Thread Jacob Rus

Michel Fortin wrote:
I've began writing the parsing section of the spec, and I though I'd let 
you know about where I'm heading with all this.


You should write it in something closer to a BNF-like format.  The 
current version is about 10x more verbose than necessary, and it makes 
reading the spec considerably more difficult.


___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown Extra Spec: Parsing Section

2008-05-11 Thread Jacob Rus

Michel Fortin wrote:
I've began writing the parsing section of the spec, and I though I'd 
let you know about where I'm heading with all this.


Also, you're still going to have quite a few sticky edge cases with your 
current parsing model.  What happens when we have a ``-delimited URL 
inside a blockquote?  For instance:


 what about this http://
 google.com/ case?

-Jacob

___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown Extra Spec: Parsing Section

2008-05-11 Thread Michel Fortin

Le 2008-05-11 à 20:55, Jacob Rus a écrit :

You should write it in something closer to a BNF-like format.  The  
current version is about 10x more verbose than necessary, and it  
makes reading the spec considerably more difficult.


The reason I'm doing it like this is that I doubt everything will be  
expressible in a BNF format. Using plain english descriptions allows  
me to not bother about fitting things to a specific grammar and just  
write what I feel is the most natural and the easier to understand.


Shopping for a more formal and less verbose grammar, if we need one,  
will be much easier once we know what we need, once we can compare  
existing grammars against a checklist of what is necessary to  
implement the given parsing algorithm.


If you remember the timetable I've given, you'll see that I've booked  
about half a year for polishing things out. This includes rephrasing  
sentences, refactorizing the syntax, and reformatting the spec to make  
it easier to understand. This *could* include switching to a new  
grammar format if it makes things more intuitive and readable.



Also, you're still going to have quite a few sticky edge cases with  
your current parsing model.  What happens when we have a ``- 
delimited URL inside a blockquote?  For instance:


 what about this http://
 google.com/ case?



Well, currently newlines aren't allowed inside automatic links in  
Markdown.pl, PHP Markdown and some others. Implementations who see an  
automatic link there sees it as a link to http://  
google.com/ (notice the space) or http://; (notice what's missing).


 http://babelmark.bobtfish.net/?markdown=%0D%0A%3E+what+about+this+%3Chttp%3A%2F%2F%0D%0A%3E+google.com%2F%3E+case%3Fnormalize=onsrc=1dest=2 



Anyway, with the parsing model in three passes I'm currently defining  
it's pretty trivial to do correctly: the block elements pass extracts  
the text of the blockquote, leaving this to parse by the span element  
pass:


what about this http://
google.com/ case?

The span element pass would then see an autolink and just ignore any  
newline it finds in the URL.



Michel Fortin
[EMAIL PROTECTED]
http://michelf.com/


___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown Extra Spec: Parsing Section

2008-05-09 Thread John MacFarlane
Michel,

I think there's a problem with:

 refname
 
 A run of one or more characters, excluding any newline and U+005D
 Closing Square Bracket.

This doesn't allow refnames with embedded brackets.  But PHP Markdown
allows

[[hi]](/url)

as a valid link.  Also, PHP Markdown currently allows embedded newlines,
which are excluded by your definition:

[hi
there](/url)

Of course, embedded *blank* lines should be excluded.

John

+++ Michel Fortin [May 08 08 23:59 ]:
 Hello all,

 I've began writing the parsing section of the spec, and I though I'd let 
 you know about where I'm heading with all this.

 Basically, parsing is defined as three consecutive passes: parsing  
 document elements, parsing block elements and parsing span elements.  
 Each pass is going to contain a set of rules the parser should attempt  
 to match while parsing the input. Rules are expressed in English, but  
 are highly structured so that it should be pretty straightforward to  
 convert to a formal grammar if the grammar is powerful enough to express 
 them.

 I'm not saying too much here; elaborate explanations are better in the  
 spec than in this volatile email. If you're interested, take a look and 
 tell me what you think:
 http://michelf.com/specs/markdown-extra/#parsing


 Michel Fortin
 [EMAIL PROTECTED]
 http://michelf.com/


 ___
 Markdown-Discuss mailing list
 Markdown-Discuss@six.pairlist.net
 http://six.pairlist.net/mailman/listinfo/markdown-discuss

___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown Extra Spec: Parsing Section

2008-05-09 Thread John MacFarlane
   could be converted into a reference-style link like this:
 
  [[link with embedded brackets]]
 
 I don't believe Markdown allows for links to be defined like this.  A
 reference style link would be defined as:
 
 [[link with embedded brackets]][link with embedded brackets]
 
 or
 
 [link with embedded brackets][]

Newer versions of Markdown.pl and PHP Markdown do allow the trailing
'[]' to be omitted.

% Markdown.pl --version

This is Markdown, version 1.0.2b8.
Copyright 2004 John Gruber
http://daringfireball.net/projects/markdown/

% Markdown.pl 
[hi]

[hi]: /url

pa href=/urlhi/a/p
 
I agree that without this style of reference links, it would make more
sense to distinguish between link text and 'refnames'. But this style of
link gives a reason not to distinguish them.

John

___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown Extra Spec: Parsing Section

2008-05-09 Thread John MacFarlane
Just to clarify: Most implementations allow newlines and embedded
brackets in link text. But pandoc and peg-markdown seem to be the only
ones that currently allow them in link reference definitions:

http://babelmark.bobtfish.net/?markdown=[hi%0D%0Athere][]%0D%0A%0D%0A[hi+there]%3A+%2Furl%0D%0A%0D%0A[hi%0D%0Aagain][]%0D%0A%0D%0A[hi%0D%0Aagain]%3A+%2Furl%0D%0A%0D%0A[[hi]](%2Furl)%0D%0A%0D%0A[[hello]]%0D%0A%0D%0A[[hello]]%3A+%2Furl%0D%0Anormalize=onsrc=1dest=2

+++ Tomas Doran [May 09 08 21:19 ]:

  Still, why shouldn't refnames be allowed to have embedded brackets
  and newlines, if explicit links can?

 To me those seem to be two entirely different things...  The _text_ of
 the link has to be flexible to allow almost anything.  refnames, on
 the other hand are identifiers and as such it makes sense for them to
 be more constrained.  For instance, if you allow new lines in them it
 opens a whole bunch of questions as to what white space counts.


 In Text::Markdown, I'm allowing new lines in link text:

 http://svn.kulp.ch/cpan/text_multimarkdown/trunk/t/Text- 
 Markdown.mdtest/Links_multiline_bugs_1.text
 http://svn.kulp.ch/cpan/text_multimarkdown/trunk/t/Text- 
 Markdown.mdtest/Links_multiline_bugs_2.text

 As it was specifically reported to me as a bug:

 http://bugs.debian.org/459885

 Cheers
 Tom


 ___
 Markdown-Discuss mailing list
 Markdown-Discuss@six.pairlist.net
 http://six.pairlist.net/mailman/listinfo/markdown-discuss

___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown Extra Spec: Parsing Section

2008-05-09 Thread Michel Fortin
I've clarified and changed a few things about some parsing rules and  
started defining new rules for the block elements pass.


Of notice is the flat code block in the block elements pass, which  
is is going to be part of the next version of PHP Markdown Extra.



Michel Fortin
[EMAIL PROTECTED]
http://michelf.com/


___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown Extra Spec: Parsing Section

2008-05-08 Thread Andrea Censi
On Thu, May 8, 2008 at 8:59 PM, Michel Fortin [EMAIL PROTECTED] wrote:
 Basically, parsing is defined as three consecutive passes: parsing document
 elements, parsing block elements and parsing span elements.

Looks good so far.

The most delicate part is still to come (defining indentation for
lists, and (X)(HT)ML fragments in the text flow).

-- 
Andrea Censi
PhD student, Control  Dynamical Systems, Caltech
http://www.cds.caltech.edu/~andrea/
 Life is too important to be taken seriously (Oscar Wilde)
___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss