Re: [fossil-users] Pessimism about CommonMark in fossil

Trevor Sun, 28 Sep 2014 13:50:08 -0700

Hello:

Thanks Natasha, for reviewing the CommonMark specification and
identifying issues applicable to your Fossil - Markdown parser.
Your arguments are persuasive.


Your obvious skill and knowledge about markdown and general text
parsing would be of high value to the CommonMark group and I
think any comments you presented to them would be welcomed and
might affect their specification itself.

Last year I struggled with adding markdown documents to the wiki,
intending to use Fossil for non-programmer documentation tasks.
Like other Fossil users, I decided to store documents as ordinary
versioned files and deprecated the wiki.

Another consideration was the need to generate pdf versions with
paging control. I now use John MacFarlane's pandoc program with
documents in pandoc markdown, storing them in Fossil repositories.

  http://johnmacfarlane.net/pandoc/README.html#pandocs-markdown

John MacFarlane is a principal author of the CommonMark
specification.

I found a utility, gouda.pl, which asks for a table of contents
file for a directory's worth of markdown files and then uses pandoc
to generate html and pdf output versions, including the table of
contents as a link list. The author withdrew that perl script to
favour one written in clojure, requiring the Java virtual machine
which I did not want to install.

  http://www.unexpected-vortices.com/sw/rippledoc/

The author, John Gabrielle, wrote another version in python:

  https://github.com/npettiaux/gouda

I am still using the perl version and this combination meets my
present needs.

Your Fossil-markdown parser presents a good display for the
features that I use and treats as plain text the '\pagebreak'
instructions intended for pdf paged output.

Markdown is versatile, simple format but the multiplicity of
"standards" limits its universality. I hope you can contribute
to the making of such a standard.

Thanks again,

Trevor


On Sun, 28 Sep 2014 15:36:27 +0000
Natacha Porté <nata...@instinctive.eu> wrote:

> Hello,
> 
> as you might already know, I'm the primary author of libsoldout and
> its integration into fossil to perform markdown-to-html conversion.
> 
> If you followed recent news, you might have heard of CommonMark[1],
> which is an attempt to unify most implementations and extensions of
> Markdown, by providing a non-ambiguous specification. It's an
> honorable goal, so it makes sense to try to converge existing
> implementations towards the new standard.
> 
> Unfortunately, the architecture of the parser makes it extremely
> difficult to implement CommonMark, probably even more difficult than
> writing a new parser from scratch. In the rest of the e-mail I will
> detail why I think so, in case some of the brilliant minds find a
> mistake in my reasoning and a way to implement CommonMark easily in
> fossil.
> 
> In case I'm not wrong, it raises the question of changing the markdown
> engine integrated in fossil, or purposefully forsake CommonMark
> support (which might make sense if its adoption ends up not as wide
> as its authors hope). Fortunately, there is no rush to take such a
> decision, as a community we can reasonably to wait and see how
> CommonMark adoption pans out.
> 
> [1]: http://commonmark.org/
> 
> 
> 
> The heart of the architecture is built around an online parser: the
> input is considered as an infinite stream of characters, and each
> component of the parser either consumes input characters or hand over
> control to another component, with control transfer made in such a way
> that there is no loop without input character consumption.
> 
> The main advantage of such an architecture is how easy it is to prove
> that in actually terminates, and to prove upper bounds on memory
> usage. When components are loosely coupled, which is the case here,
> it also makes debugging much easier.
> 
> The main drawback is that there is no backtracking possible without
> cheating, and very limited look-ahead without severely tightening the
> coupling between components.
> 
> Moreover, when designing the parser, I enforced very loose coupling
> between component by requiring all language elements to be
> individually added or removed from the parser. The reason for that is
> that complete Markdown is extremely powerful, especially because of
> raw HTML input features. That's too powerful for untrusted input,
> like blog comments or wikipages. So "unsafe" features have to be
> optional. But there are different levels of "unsafety", for example
> one might want to forbid titles in blog comments, to prevent
> untrusted users from messing with the page layout. Or one might want
> to forbid all links for more-untrusted users while allowing them for
> not-so-untrusted users. So it seemed better to engineer the parser
> around making it possible to allow or forbid any combination of
> features.
> 
> So the online-parser loop variant means that any active character must
> have its semantics decided immediately, and the loose coupling means
> other language elements cannot interfere in the semantics decision.
> 
> Other hand, CommonMark seems to have certain ideas about parser
> architecture leaking into the specification. For example the notion of
> precedence is directly at odds with the description of the previous
> paragraph.
> 
> Consider for example the following ambiguous Markdown code, which is
> example 239 of current CommonMark specification:
> *foo`*`
> 
> When the leading star is encountered, my parser has to scan for the
> closing star, and doing so without considering the backtick, since
> code spans might very well have been disabled. So my parser processes
> it as an emphasis that happen to contain a backtick.
> 
> Meanwhile, CommonMark prescribes code spans as having a higher
> precedence than emphasis, so the example should be parsed as a code
> span that happens to contain a star.
> 
> As you can imagine, this isn't an isolated example, otherwise working
> around it or cheating would have been easy. Most the span-level
> examples / specifications actually involve the more general rule of
> having "leaf" span elements taking precedence over "container" span
> elements. (Which again is fine by itself, I have nothing against it,
> it is just poorly compatible with my existing design.)
> 
> The precedence of fenced code blocks over reference declarations
> raises a similar problem, although to a smaller extent.
> 
> I admit I haven't yet looked deeply into the subtleties of block-level
> language elements, but even if everything went best on that area, the
> parser would still look ridiculous on the test suite without putting
> tremendous work.
> 
> 
> I will do my best to answer to any question or comments, but because
> of various issues, I might need up to a few days to post answers.
> 
> 
> Thanks for your attention,
> Natacha

_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Re: [fossil-users] Pessimism about CommonMark in fossil

Reply via email to