On Aug 27, 2007, at 6:42 PM, Michel Fortin wrote:

I'm totally not convinced that creating a byte-by-byte parser in Perl or PHP is going to be very useful.
The key here is really having clearly defined state transitions.
I'm not sure what you mean by that in relation to what I wrote above.

That your implementation concerns are not what I am presently concerned with, and I don’t think you are right (i.e. you can still use lots of regexps for tokenization in a regular parser) -- but this is another discussion.

[...]
There are many complains about different things here. About the syntax, you complain that it is badly defined (I agree).

You then talk about lack of simplicity in the code, which I assume apply to Markdown.pl (or PHP Markdown), not the syntax; or perhaps you mean that the syntax makes it impossible to write simple code to parse it? I'm not sure I understand what you mean here.

Yes -- my view is that the complexity of the implementation stems from not basing the implementation on a standard parser. I.e. presently everything is basically a special-case in the implementation. With a grammar, the parser is generated, and you do not do things like run a pass first that obfuscates HTML (into hashes) and run a pass that grabs the raw before it grabs the emphasis etc.

Then you talk about the lack of extensibility of the language grammar (which I'm not sure what you mean by that, is there a language grammar for Markdown anyway?).

With a formal grammar, extending the syntax is generally just adding or editing a rule, and we have the syntax extension. By hand-writing the parser, you tend to end up with code written for a very specific purpose generally not easy to extend. Tweak something one place in the source, and you break something in another place, I think we have seen that already on a few occasions (when something is fixed/changed in Markdown.pl).

Then you go on the lack of performance (are you calling this a syntax or parser issue or both?).

I mention that because if we had a grammar and a generated parser, we would get a known good time complexity and pretty efficient code.

I.e. my point was that all these problems I raise are really all rooted in the lack of a grammar -- sure we can address them even w/o a grammar, and maybe it is not (all) the case with the PHP Markdown implementation, I was just adding some (more) arguments for why I would like to see the goal of a formal grammar be taken more serious.

Finally you say the current implementation (I assume you're talking about Markdown.pl, perhaps PHP Markdown) does not "effectively" support nested constructs (which constructs? what does "effectivly" means here?) but "support" them somewhat by recursively reparsing parts of the document. Very true, but how is that a problem for you?

Effectively as in, in practice the parser is a parser for a regular language [1], and only by doing multiple passes where subsets are hidden from further parses, does it achieve its result (Markdown is not a regular language, thus you need a parser “better” than one for a regular language to parse it).

This solution though is IMO anything but ideal, and it has been the cause of many bugs in the past and IMO the result is still not what I would prefer to see, e.g. the thing about token type having higher precedence than position in the document.

[1]: http://en.wikipedia.org/wiki/Regular_language

[...]
I don't really want to see the syntax changed in and out only to make it easier to implement as an incremental parser.

Yeah, that is a more interesting discussion -- how much would be okay to change? For example if we change the rules so that we had _emphasis_ and *strong*, we would solve the problem with ***, and IMO a welcomed change since typing four asterisks for bold is tedious and noisy in the text (granted, cmd-B will do the asterisks for me, but still…)

I don't think such a parser would be usable (read fast-enough) in PHP anyway. Well, perhaps it could be, but not in the traditional sense of an incremental parser; the concept would probably need to be stretched a lot to fit with regular expressions.

I am not sure what you base these assumptions on. What exactly is it that makes PHP so extremely slow that it is unfitted for a parser, yet the current (granted, regexp-based) PHP Markdown works fine?

Yes, and personally I would say whenever you do [foo][bar] you get a link, regardless of whether or not bar is a defined reference -- if bar is not a defined reference, you could default to make it reference the URL ‘#’ [...]
Hum, I disagree strongly here that creating links to nowhere (#) is the solution to undefined reference links. This is bad usability for authors who will need to test every links in resulting page to make sure they're linking where they should be

On the contrary, add this to your preview style sheet:

    a[href="#"] {
        background: blue;
        border: 2px solid red;
        color: white;
    }

Now you have a very good indicator for missing links, contrary to now, where they easily blend in with the regular text, and there is no simple way to find them.

_______________________________________________
Markdown-Discuss mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Reply via email to