Re: Backtick Hickup

Allan Odgaard Tue, 28 Aug 2007 16:33:09 -0700

On Aug 27, 2007, at 10:35 PM, Michel Fortin wrote:

Personally, as I have said before, the back-tick rules areconfusing (when you want to include a back-tick in the code) andwe might be better off by just defining some simpler rules.
I don't find them confusing, but perhaps it's only because I'm usedto it. Which aspect of it do you find confusing?

Maybe ‘intuitive’ would have been a better choice of word. But thisthread started because somebody did not understand how to embed back-ticks in back-tick quoted strings -- personally I didn’t understandit either until I looked at the implementation.

[...]
I think I prefer the current behaviour. I can't really see whenhaving to escape the content of code span would be useful. Perhapsyou had something in mind when proposing that?

Yes, when you need special characters -- you can’t use entitiesinside `…` so ``…`` would allow you to do e.g. \u2620 for a unicodecharacter or similar -- with everybody using utf-8 these days (knockon wood) escape codes for special characters are less useful than inthe past.

[...]
I have some difficulty figuring out an what you mean by "embededHTML does not lean itself well to the 'split the document intoparagraphs'".
Markdown currently distinguish block-level HTML elements from span-level HTML elements: The former creates blocks which are left aloneby Markdown (and left outside paragraphs) while the later getswrapped into paragraphs (as valid HTML expects them to be) alongwith Markdown-formatted text.

Yes, we are dependent on Markdown finding the HTML before it does theparagraph splitting, so it doesn’t insert <p> in my HTML -- yet thepresent heuristic to find HTML is easily confused (talkingMarkdown.pl), for me it actually got worse when John switched to thePerl library thing.

In fact, presently I have my own preprocessor for my Markdown pages(on my site, which sometimes need to embed tables and stuff) to takeout the HTML before giving it to Markdown -- although this is alsobecause Markdown does not know about <% scripting %> <?php tags ?>and since there is no grammar where I can just educate it about them,I need to handle that myself in a pre-parse step.

Anyway, if we agree that everything is dependent on everythingthat precedes it, I think we can slowly start to agree that *also*having things depend on what follows, is problematic.
Well, I think you mean problematic for writing a parser, in whichcase I disagree.

No, I mean problematic as in; what the hell should we do? You and Idisagree about how to interpret the same line of Markdown exactlybecause it depends on the angle you view it from (read: which tokenyou think is most important), i.e. totally subjective…

The “syntax” quickly becomes the implementation [...]
Well, look at how the WHATWG is defining HTML right now: it'sexactly that. They describe how the parser works (in english), andeverything that match its behaviour is conforming...


Yes, and do you know *why* they are doing that?

It is because all the initial browsers had no scent of a real parser,they (seriously!) did things like:


   if(strcmp("<b>", tag))
      bold = true;
   else if(strcmp("</b>", tag))
      bold = false;
   …

Even though there was an official specification for how to parse HTML(well, SGML), no browsers actually did it that way, and authors didlots of totally broken pages, and browsers interpreted themdifferently, and browsers didn’t even interpret valid HTML correct(i.e. there are e.g. the rule that when you close a context in SGML,all missing close tags are implicit, and I haven’t seen a singlebrowser actually do that, even though it is actually a quite nicefeature, since you can leave out lots of close tags -- but since theydid not have a recursive descent parser or similar, they had no cluewhat the current context was, so that is likely why they didn’t doit, that and the fact that they probably never read the SGMLspecification), etc.

So W3C said fuck this, let’s totally scrap SGML, it was too complexfor browser implementors to wrap their head around (understandable!),so let’s do a “simple” subset (XML, which turned out to be not thatsimple in the end when they retrofitted namespaces and all sorts ofcrap into it) and XHTML is the new thing, totally strict! But no-onecared about XHTML, no browser really supported it, because we havelike billions HTML pages out there, we can’t just drop them.

So given this rather broken situation, the WhatWG decided to try tofigure out in which ways all the browsers were broken and documentthat to get them in sync, and make that the official spec, so that wecan move on with (expanding) the HTML specification w/o cuttingbackwards compatibility -- because browser vendors don’t wantexisting pages to break, cause that makes them lose users, so if W3Cadds features to HTML which require the browser to have a strictparser to really work, browser vendors may not do it because ofbackwards compatibility, or something like that…


You really think Markdown should take the same route? ;)

which brings out an interesting side topic: how should HTML beparsed (or event specified) within Markdown? :-)

I would say strict (for which a grammar is pretty simple)! There isno reason Markdown should conform to the looser WhatWG definition,since strict HTML is a subset of WhatWG’s definition, and they made asuperset only to be compatible with existing bad pages, but Markdowndoes not need to support that.

[...]
I think the better solution to that problem would be to disallowemphasis starting in the middle of a word ending within another.And as for underscore-emphasis problem, I'd suggest doing just asPHP Markdown Extra does (one of its *documented* features): onlyallow it on word boundaries, not in the middle of a word. I've yetto get a complain about that change in behaviour and I know somepeople switched to Extra just because of that.

I would prefer that interpretation as well, I have even requested itin the past, since it is the #1 mistake I see from people who postcomments on my blog (they do not escape underscores insnake_case_words or surround them with back-ticks). I can’t find thethread, but most thought it was useful, but it is not uncommon thatpeople argue for one behavior that they are actually not ever usingin practice.


It sounds like I should switch to Markdown Extra for my blog comments…

[...] I am *not* talking about documenting every single edge case,I am talking about defining the syntax using more traditionalmeans of defining syntaxes.
But how can you write a formal grammar without having to think ofthe edge cases? Are you suggesting we should ignore edge cases whendefining the syntax? And if yes, what does qualify as an "edge case"?


I don’t think you have worked with parser generators and grammars.

Basically the implementation is generated from the grammar (if it ispossible to specify the grammar fully) -- the grammar can be trickyto get right, but there are no edge-cases luring in the corners inthe same way that there is for a hand-written multi-pass regular parser.

I.e. compare it to a mathematical equation, people can’t get todifferent solutions for the same equation unless they are misreading it.

This is why a formal grammar is so powerful, because it reallyspecifies *everything* -- the problem can be if it specifies it theway we want it. E.g. in the short example I gave, I did not supportself-closing HTML tags in paragraph text, so this is simply notsupported, no argument there -- the argument is thus whether weshould add it to the grammar, not how to interpret the grammar.

That said, a grammar can be ‘invalid’ so to speak -- but if specifiede.g. as an ANTLR grammar, ANTLR will tell us which rules cause whichproblems.




_______________________________________________
Markdown-Discuss mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Re: Backtick Hickup

Reply via email to