Re: Incremental parser (was: Backtick Hickup)

Michel Fortin Sun, 02 Sep 2007 07:44:38 -0700

Le 2007-08-28 à 18:51, Allan Odgaard a écrit :

Then you talk about the lack of extensibility of the languagegrammar (which I'm not sure what you mean by that, is there alanguage grammar for Markdown anyway?).
With a formal grammar, extending the syntax is generally justadding or editing a rule, and we have the syntax extension. By hand-writing the parser, you tend to end up with code written for a veryspecific purpose generally not easy to extend. Tweak something oneplace in the source, and you break something in another place, Ithink we have seen that already on a few occasions (when somethingis fixed/changed in Markdown.pl).

A case in point would be Markdown.pl 1.0.2b1, which added a fix forthis:


    <span attr='`ticks`'>like this</span>

but at the same time created a problem which did not exist previouslywith this case:


    `<span attr='`ticks`'>like this</span>`

In Markdown.pl, this problem is still unfixed as we speek -- you canconfirm that for yourself on the Dingus. PHP Markdown handles bothcases correctly since 1.0.1d by making the HTML tokenizer aware ofcode spans, and in yesterday's release 1.0.1i it's handled by a smallincremental parser for HTML tags, code spans, and backslash escapes;all done in one stage.

Then you go on the lack of performance (are you calling this asyntax or parser issue or both?).
I mention that because if we had a grammar and a generated parser,we would get a known good time complexity and pretty efficient code.

The time complexity would be known, but the speed of the generatedparser is only as good as the parser generator can get. I'm curiousto see how a generated parser can perform in PHP, or in Perl, for acomplex syntax like Markdown. Any example?

I.e. my point was that all these problems I raise are really allrooted in the lack of a grammar -- sure we can address them even w/o a grammar, and maybe it is not (all) the case with the PHPMarkdown implementation, I was just adding some (more) argumentsfor why I would like to see the goal of a formal grammar be takenmore serious.


What do you mean by "taken more serious"?

Up to now, you've expressed your wishes for Markdown as formalgrammar, backed it out with plenty of good arguments, but I'm stillnot catching what you're trying to make happen. Are you hoping JohnGruber will reappear and say he has rewritten Markdown as a formalgrammar?

Or perhaps you want to convince me to do it... I'm convinced it'd beuseful for plenty of reasons you've pointed out. But it turns outthat I have plenty of other things to do and I'm not so interested inwriting a formal grammar by myself (not that I wouldn't be willing tohelp if someone was doing it).

Or perhaps you just want me (maybe others) to commit using thegrammar if you come with one...

[...]
I don't really want to see the syntax changed in and out only tomake it easier to implement as an incremental parser.
Yeah, that is a more interesting discussion -- how much would beokay to change? For example if we change the rules so that we had_emphasis_ and *strong*, we would solve the problem with ***, andIMO a welcomed change since typing four asterisks for bold istedious and noisy in the text (granted, cmd-B will do the asterisksfor me, but still…)

I think that should be a case by case basis. A first draft of thegrammar for a particular syntax is written, then reviewed, and wethen decide if it needs to be complexified further to better handlecurrent Markdown documents.

But changing single-asterisks to mean "strong emphasis" is way toomuch diverging in my opinion. I'm almost always using single-asterisks to denote emphasis, not strong emphasis, and I expect sucha change may break about half the Markdown documents out there.That's what I'd call forking.

I don't think such a parser would be usable (read fast-enough) inPHP anyway. Well, perhaps it could be, but not in the traditionalsense of an incremental parser; the concept would probably need tobe stretched a lot to fit with regular expressions.
I am not sure what you base these assumptions on. What exactly isit that makes PHP so extremely slow that it is unfitted for aparser, yet the current (granted, regexp-based) PHP Markdown worksfine?

I was thinking about a byte-by-byte parser written in plain PHP atthe time. See the second half of my recent reply to Jacob Rus... justafter "Why would a PHP state machine be so terribly slow?":

<http://six.pairlist.net/pipermail/markdown-discuss/2007-August/000740.html>

Note how it's the silliest techniques (from a compiled languagestandpoint) that performs the fastest in PHP in the benchmarks citedin the email above. I don't know much about parser generators, but Isuspect they may not so well-suited for performance in PHP.

Anyway, if a generated parser is not enough, it's always possible touse a formal grammar as the basis for writing a parser more optimizedthan what a parser generator can do. I'm not trying to put this as anargument against a formal grammar.

Hum, I disagree strongly here that creating links to nowhere (#)is the solution to undefined reference links. This is badusability for authors who will need to test every links inresulting page to make sure they're linking where they should be
On the contrary, add this to your preview style sheet:

    a[href="#"] {
        background: blue;
        border: 2px solid red;
        color: white;
    }
Now you have a very good indicator for missing links, contrary tonow, where they easily blend in with the regular text, and there isno simple way to find them.

That's assuming you have a separate preview mode, are using a specialpreview stylesheet, and that you actually look at the preview beforepublishing.

I would not expect Markdown's usability to depend on such a preciseworkflow. For instance, have you thought about the poor commenter ona website who doesn't know the comment form use Markdown, writing:


    Type these three keys in sequence: [1] [2] [3]

getting this:

    <p>Type these three keys in sequence: <a href="#">1</a> [3]

and seeing that in his browser:

    Type these three keys in sequence: 1 [3]

Even assuming the user did preview his comment before posting it,he'll probably struggle to figure out what's happening and to find afix.

Markdown is often used in a context where the user doesn't even*know* what he/she is writing will pass through a Markdown parser,and, with a few exceptions (like for underscore emphasis within aword), Markdown works very well for that; your proposed changes wouldmake Markdown unsuitable to these environments.



Michel Fortin
[EMAIL PROTECTED]
http://www.michelf.com/


_______________________________________________
Markdown-Discuss mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Re: Incremental parser (was: Backtick Hickup)

Reply via email to