Re: Incremental parser

Michel Fortin Mon, 27 Aug 2007 09:10:06 -0700

Le 2007-08-15 à 15:04, Jacob Rus a écrit :

Michel Fortin wrote:
I disagree about it being better for readers and writers. To me asole asterisk or underscore doesn't mean emphasis. If someonevoluntarily writes only one asterisk in front of a word, heprobably didn't mean "emphasis until the end of the paragraph"either.
Well, this really depends. If I have a text editor which does somesyntax highlighting for me, I'd rather have emphasis at the end ofthe paragraph, which is extremely obvious and can be fixed, than astray asterisk.

I don't think syntax highlighting is an argument that should helpdecide what Markdown should do.

To solve your problem, I suggest you have two colors: one for the so-called "valid" emphasis, the one Markdown will effectively convert toemphasis, another for "invalid" emphasis, for when the closingasterisk is missing. That should make authoring errors even moreobvious.

But really, the point here is that we can't determine whether thatstray asterisk has meaning until an indefinite point in the future(the end of the paragraph). This means it's hard for a reader tounderstand the document's intent under the current rules until thewhole paragraph has been read.

Well, what you're describing applies to cases where the paragraph istoo long to scan visually. I don't think Markdown should be modeledaround stretched cases like that.

There is no thing such as "invalid Markdown" currently. When wouldyou call a Markdown document "invalid"?
You happily gave me a couple examples above. :) I would consideranything that tries to be markdown syntax, but is never closed tobe invalid, as one example.

Basically, I think what you're calling "invalid Markdown" is reallywhat is left undefined by the current documentation.

It's certainly good practice to avoid depending on undefinedbehaviours. But given that half Markdown users haven't read a line ofthe syntax document and know very little about HTML, I don't think itwould accomplish much to call some documents "invalid" when theycontains asterisks at the wrong place. To me, it sounds like anexcuse to output garbage for poorly-edited documents, which is notsomething I want to do with my parser.

Sure, that's true, but that doesn't answer my question. Is themanual parsed as one big file or many smaller ones? And if onlyone file, what size is it? I'm interested in understanding whatmakes it so slow, but I haven't much data at hand to comment onthe speed issue.
Well, why shouldn't markdown be equally usable for many small filesor a few big ones? I'd rather have it be performant for all files.

Me too. I'm not accusing anyone of having files too big. If you havesomething that parses too slowly, fill a bug report (with a samplefile) so someone can look at the problem.

It's clear however that the current parser for PHP Markdown andMarkdown.pl is pretty slow with big files, and that it may not beeasy to fix.

So the issue isn't so much about algorithmic complexity, it'sabout PHP code being a magnitude slower than regular expressionsto execute, or any native code for that matter. The smallest theamount of PHP code and the least repeated it is, the better forperformance; that's how I have optimised PHP Markdown in the lastfew years.
Well this just implies to me that PHP should not be used for textprocessing in general... ;)

I agree: PHP is very poor in that regard, and that's why in PHPMarkdown I defer everything I can to regular expressions, which aremuch faster. Ideally, we'd have a compiled parser, but even thatwould be useless to thousands of people on shared hosting.



Michel Fortin
[EMAIL PROTECTED]
http://www.michelf.com/


_______________________________________________
Markdown-Discuss mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Re: Incremental parser

Reply via email to