Le 2007-08-28 à 18:51, Allan Odgaard a écrit :

Then you talk about the lack of extensibility of the language grammar (which I'm not sure what you mean by that, is there a language grammar for Markdown anyway?).

With a formal grammar, extending the syntax is generally just adding or editing a rule, and we have the syntax extension. By hand- writing the parser, you tend to end up with code written for a very specific purpose generally not easy to extend. Tweak something one place in the source, and you break something in another place, I think we have seen that already on a few occasions (when something is fixed/changed in Markdown.pl).

A case in point would be Markdown.pl 1.0.2b1, which added a fix for this:

    <span attr='`ticks`'>like this</span>

but at the same time created a problem which did not exist previously with this case:

    `<span attr='`ticks`'>like this</span>`

In Markdown.pl, this problem is still unfixed as we speek -- you can confirm that for yourself on the Dingus. PHP Markdown handles both cases correctly since 1.0.1d by making the HTML tokenizer aware of code spans, and in yesterday's release 1.0.1i it's handled by a small incremental parser for HTML tags, code spans, and backslash escapes; all done in one stage.


Then you go on the lack of performance (are you calling this a syntax or parser issue or both?).

I mention that because if we had a grammar and a generated parser, we would get a known good time complexity and pretty efficient code.

The time complexity would be known, but the speed of the generated parser is only as good as the parser generator can get. I'm curious to see how a generated parser can perform in PHP, or in Perl, for a complex syntax like Markdown. Any example?

I.e. my point was that all these problems I raise are really all rooted in the lack of a grammar -- sure we can address them even w/ o a grammar, and maybe it is not (all) the case with the PHP Markdown implementation, I was just adding some (more) arguments for why I would like to see the goal of a formal grammar be taken more serious.

What do you mean by "taken more serious"?

Up to now, you've expressed your wishes for Markdown as formal grammar, backed it out with plenty of good arguments, but I'm still not catching what you're trying to make happen. Are you hoping John Gruber will reappear and say he has rewritten Markdown as a formal grammar?

Or perhaps you want to convince me to do it... I'm convinced it'd be useful for plenty of reasons you've pointed out. But it turns out that I have plenty of other things to do and I'm not so interested in writing a formal grammar by myself (not that I wouldn't be willing to help if someone was doing it).

Or perhaps you just want me (maybe others) to commit using the grammar if you come with one...

[...]
I don't really want to see the syntax changed in and out only to make it easier to implement as an incremental parser.

Yeah, that is a more interesting discussion -- how much would be okay to change? For example if we change the rules so that we had _emphasis_ and *strong*, we would solve the problem with ***, and IMO a welcomed change since typing four asterisks for bold is tedious and noisy in the text (granted, cmd-B will do the asterisks for me, but still…)

I think that should be a case by case basis. A first draft of the grammar for a particular syntax is written, then reviewed, and we then decide if it needs to be complexified further to better handle current Markdown documents.

But changing single-asterisks to mean "strong emphasis" is way too much diverging in my opinion. I'm almost always using single- asterisks to denote emphasis, not strong emphasis, and I expect such a change may break about half the Markdown documents out there. That's what I'd call forking.


I don't think such a parser would be usable (read fast-enough) in PHP anyway. Well, perhaps it could be, but not in the traditional sense of an incremental parser; the concept would probably need to be stretched a lot to fit with regular expressions.

I am not sure what you base these assumptions on. What exactly is it that makes PHP so extremely slow that it is unfitted for a parser, yet the current (granted, regexp-based) PHP Markdown works fine?

I was thinking about a byte-by-byte parser written in plain PHP at the time. See the second half of my recent reply to Jacob Rus... just after "Why would a PHP state machine be so terribly slow?":

<http://six.pairlist.net/pipermail/markdown-discuss/2007-August/ 000740.html>


Note how it's the silliest techniques (from a compiled language standpoint) that performs the fastest in PHP in the benchmarks cited in the email above. I don't know much about parser generators, but I suspect they may not so well-suited for performance in PHP.

Anyway, if a generated parser is not enough, it's always possible to use a formal grammar as the basis for writing a parser more optimized than what a parser generator can do. I'm not trying to put this as an argument against a formal grammar.


Hum, I disagree strongly here that creating links to nowhere (#) is the solution to undefined reference links. This is bad usability for authors who will need to test every links in resulting page to make sure they're linking where they should be

On the contrary, add this to your preview style sheet:

    a[href="#"] {
        background: blue;
        border: 2px solid red;
        color: white;
    }

Now you have a very good indicator for missing links, contrary to now, where they easily blend in with the regular text, and there is no simple way to find them.

That's assuming you have a separate preview mode, are using a special preview stylesheet, and that you actually look at the preview before publishing.

I would not expect Markdown's usability to depend on such a precise workflow. For instance, have you thought about the poor commenter on a website who doesn't know the comment form use Markdown, writing:

    Type these three keys in sequence: [1] [2] [3]

getting this:

    <p>Type these three keys in sequence: <a href="#">1</a> [3]

and seeing that in his browser:

    Type these three keys in sequence: 1 [3]

Even assuming the user did preview his comment before posting it, he'll probably struggle to figure out what's happening and to find a fix.

Markdown is often used in a context where the user doesn't even *know* what he/she is writing will pass through a Markdown parser, and, with a few exceptions (like for underscore emphasis within a word), Markdown works very well for that; your proposed changes would make Markdown unsuitable to these environments.


Michel Fortin
[EMAIL PROTECTED]
http://www.michelf.com/


_______________________________________________
Markdown-Discuss mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Reply via email to