Re: Incremental parser (was: Backtick Hickup)

Allan Odgaard Tue, 28 Aug 2007 15:52:06 -0700

On Aug 27, 2007, at 6:42 PM, Michel Fortin wrote:

I'm totally not convinced that creating a byte-by-byte parser inPerl or PHP is going to be very useful.
The key here is really having clearly defined state transitions.
I'm not sure what you mean by that in relation to what I wrote above.

That your implementation concerns are not what I am presentlyconcerned with, and I don’t think you are right (i.e. you can stilluse lots of regexps for tokenization in a regular parser) -- but thisis another discussion.

[...]
There are many complains about different things here. About thesyntax, you complain that it is badly defined (I agree).
You then talk about lack of simplicity in the code, which I assumeapply to Markdown.pl (or PHP Markdown), not the syntax; or perhapsyou mean that the syntax makes it impossible to write simple codeto parse it? I'm not sure I understand what you mean here.

Yes -- my view is that the complexity of the implementation stemsfrom not basing the implementation on a standard parser. I.e.presently everything is basically a special-case in theimplementation. With a grammar, the parser is generated, and you donot do things like run a pass first that obfuscates HTML (intohashes) and run a pass that grabs the raw before it grabs theemphasis etc.

Then you talk about the lack of extensibility of the languagegrammar (which I'm not sure what you mean by that, is there alanguage grammar for Markdown anyway?).

With a formal grammar, extending the syntax is generally just addingor editing a rule, and we have the syntax extension. By hand-writingthe parser, you tend to end up with code written for a very specificpurpose generally not easy to extend. Tweak something one place inthe source, and you break something in another place, I think we haveseen that already on a few occasions (when something is fixed/changedin Markdown.pl).

Then you go on the lack of performance (are you calling this asyntax or parser issue or both?).

I mention that because if we had a grammar and a generated parser, wewould get a known good time complexity and pretty efficient code.

I.e. my point was that all these problems I raise are really allrooted in the lack of a grammar -- sure we can address them even w/oa grammar, and maybe it is not (all) the case with the PHP Markdownimplementation, I was just adding some (more) arguments for why Iwould like to see the goal of a formal grammar be taken more serious.

Finally you say the current implementation (I assume you're talkingabout Markdown.pl, perhaps PHP Markdown) does not "effectively"support nested constructs (which constructs? what does "effectivly"means here?) but "support" them somewhat by recursively reparsingparts of the document. Very true, but how is that a problem for you?

Effectively as in, in practice the parser is a parser for a regularlanguage [1], and only by doing multiple passes where subsets arehidden from further parses, does it achieve its result (Markdown isnot a regular language, thus you need a parser “better” than one fora regular language to parse it).

This solution though is IMO anything but ideal, and it has been thecause of many bugs in the past and IMO the result is still not what Iwould prefer to see, e.g. the thing about token type having higherprecedence than position in the document.


[1]: http://en.wikipedia.org/wiki/Regular_language

[...]
I don't really want to see the syntax changed in and out only tomake it easier to implement as an incremental parser.

Yeah, that is a more interesting discussion -- how much would be okayto change? For example if we change the rules so that we had_emphasis_ and *strong*, we would solve the problem with ***, and IMOa welcomed change since typing four asterisks for bold is tedious andnoisy in the text (granted, cmd-B will do the asterisks for me, butstill…)

I don't think such a parser would be usable (read fast-enough) inPHP anyway. Well, perhaps it could be, but not in the traditionalsense of an incremental parser; the concept would probably need tobe stretched a lot to fit with regular expressions.

I am not sure what you base these assumptions on. What exactly is itthat makes PHP so extremely slow that it is unfitted for a parser,yet the current (granted, regexp-based) PHP Markdown works fine?

Yes, and personally I would say whenever you do [foo][bar] you geta link, regardless of whether or not bar is a defined reference --if bar is not a defined reference, you could default to make itreference the URL ‘#’ [...]
Hum, I disagree strongly here that creating links to nowhere (#) isthe solution to undefined reference links. This is bad usabilityfor authors who will need to test every links in resulting page tomake sure they're linking where they should be


On the contrary, add this to your preview style sheet:

    a[href="#"] {
        background: blue;
        border: 2px solid red;
        color: white;
    }

Now you have a very good indicator for missing links, contrary tonow, where they easily blend in with the regular text, and there isno simple way to find them.


_______________________________________________
Markdown-Discuss mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Re: Incremental parser (was: Backtick Hickup)

Reply via email to