Hello: Thanks Natasha, for reviewing the CommonMark specification and identifying issues applicable to your Fossil - Markdown parser. Your arguments are persuasive.
Your obvious skill and knowledge about markdown and general text parsing would be of high value to the CommonMark group and I think any comments you presented to them would be welcomed and might affect their specification itself. Last year I struggled with adding markdown documents to the wiki, intending to use Fossil for non-programmer documentation tasks. Like other Fossil users, I decided to store documents as ordinary versioned files and deprecated the wiki. Another consideration was the need to generate pdf versions with paging control. I now use John MacFarlane's pandoc program with documents in pandoc markdown, storing them in Fossil repositories. http://johnmacfarlane.net/pandoc/README.html#pandocs-markdown John MacFarlane is a principal author of the CommonMark specification. I found a utility, gouda.pl, which asks for a table of contents file for a directory's worth of markdown files and then uses pandoc to generate html and pdf output versions, including the table of contents as a link list. The author withdrew that perl script to favour one written in clojure, requiring the Java virtual machine which I did not want to install. http://www.unexpected-vortices.com/sw/rippledoc/ The author, John Gabrielle, wrote another version in python: https://github.com/npettiaux/gouda I am still using the perl version and this combination meets my present needs. Your Fossil-markdown parser presents a good display for the features that I use and treats as plain text the '\pagebreak' instructions intended for pdf paged output. Markdown is versatile, simple format but the multiplicity of "standards" limits its universality. I hope you can contribute to the making of such a standard. Thanks again, Trevor On Sun, 28 Sep 2014 15:36:27 +0000 Natacha Porté <nata...@instinctive.eu> wrote: > Hello, > > as you might already know, I'm the primary author of libsoldout and > its integration into fossil to perform markdown-to-html conversion. > > If you followed recent news, you might have heard of CommonMark[1], > which is an attempt to unify most implementations and extensions of > Markdown, by providing a non-ambiguous specification. It's an > honorable goal, so it makes sense to try to converge existing > implementations towards the new standard. > > Unfortunately, the architecture of the parser makes it extremely > difficult to implement CommonMark, probably even more difficult than > writing a new parser from scratch. In the rest of the e-mail I will > detail why I think so, in case some of the brilliant minds find a > mistake in my reasoning and a way to implement CommonMark easily in > fossil. > > In case I'm not wrong, it raises the question of changing the markdown > engine integrated in fossil, or purposefully forsake CommonMark > support (which might make sense if its adoption ends up not as wide > as its authors hope). Fortunately, there is no rush to take such a > decision, as a community we can reasonably to wait and see how > CommonMark adoption pans out. > > [1]: http://commonmark.org/ > > > > The heart of the architecture is built around an online parser: the > input is considered as an infinite stream of characters, and each > component of the parser either consumes input characters or hand over > control to another component, with control transfer made in such a way > that there is no loop without input character consumption. > > The main advantage of such an architecture is how easy it is to prove > that in actually terminates, and to prove upper bounds on memory > usage. When components are loosely coupled, which is the case here, > it also makes debugging much easier. > > The main drawback is that there is no backtracking possible without > cheating, and very limited look-ahead without severely tightening the > coupling between components. > > Moreover, when designing the parser, I enforced very loose coupling > between component by requiring all language elements to be > individually added or removed from the parser. The reason for that is > that complete Markdown is extremely powerful, especially because of > raw HTML input features. That's too powerful for untrusted input, > like blog comments or wikipages. So "unsafe" features have to be > optional. But there are different levels of "unsafety", for example > one might want to forbid titles in blog comments, to prevent > untrusted users from messing with the page layout. Or one might want > to forbid all links for more-untrusted users while allowing them for > not-so-untrusted users. So it seemed better to engineer the parser > around making it possible to allow or forbid any combination of > features. > > So the online-parser loop variant means that any active character must > have its semantics decided immediately, and the loose coupling means > other language elements cannot interfere in the semantics decision. > > Other hand, CommonMark seems to have certain ideas about parser > architecture leaking into the specification. For example the notion of > precedence is directly at odds with the description of the previous > paragraph. > > Consider for example the following ambiguous Markdown code, which is > example 239 of current CommonMark specification: > *foo`*` > > When the leading star is encountered, my parser has to scan for the > closing star, and doing so without considering the backtick, since > code spans might very well have been disabled. So my parser processes > it as an emphasis that happen to contain a backtick. > > Meanwhile, CommonMark prescribes code spans as having a higher > precedence than emphasis, so the example should be parsed as a code > span that happens to contain a star. > > As you can imagine, this isn't an isolated example, otherwise working > around it or cheating would have been easy. Most the span-level > examples / specifications actually involve the more general rule of > having "leaf" span elements taking precedence over "container" span > elements. (Which again is fine by itself, I have nothing against it, > it is just poorly compatible with my existing design.) > > The precedence of fenced code blocks over reference declarations > raises a similar problem, although to a smaller extent. > > I admit I haven't yet looked deeply into the subtleties of block-level > language elements, but even if everything went best on that area, the > parser would still look ridiculous on the test suite without putting > tremendous work. > > > I will do my best to answer to any question or comments, but because > of various issues, I might need up to a few days to post answers. > > > Thanks for your attention, > Natacha _______________________________________________ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users