Re: official orgmode parser

Przemysław Kamiński Mon, 26 Oct 2020 22:46:02 -0700

I'm no expert in parsing but I would expect org's parser to be quitesimilar to the multitude of markdown or CommonMark [1] parsers. Thereisn't that much difference in syntax, except maybe org is more versatileand has more syntax elements, like drawers.


Searching for "EBNF Markdown" I stumbled upon [2].


[1] https://commonmark.org/
[2] http://roopc.net/posts/2014/markdown-cfg/

On 10/26/20 10:00 PM, Tom Gillespie wrote:

Here is an attempt to clarify my own confusion around the nested
structures in org. In short: each node in the headline tree and the
plain list tree can be parse using the EBNF, the nesting level cannot,
which means that certain useful operations such as folding, require
additional rules beyond the grammar. More in line. Best!
Tom

Do you need to? This is valid as an entire Org file, I think:

*** foo
* bar
***** baz

And that can be represented in EBNF. I'm not aware of places where behavior is 
indent-level specific, except inline tasks, and that edge case can be 
represented.


You are correct, and as long as the heading depth doesn't change some
interpretation then this is a non-issue. The reason I mentioned this
though is
because it means that you cannot determine how to correctly fold an
org file from the grammar alone.

To make sure I understand. It is possible to determine the number of
leading stars (and thus the level), but I think that it is not
possible to identify the end of a section.
For example

* a
*** b
** c
* d

You can parse out a 1, b 3, c 2, d 1, but if you want to be able to
nest b and c inside a but not nest d inside a, then you need a stack
in there somewhere. You
can't have a rule such as

section : headline content
content : text | section

because the parse would incorrectly nest sections at the same level,
you would have to write

section-level-1 : headline-1 content-1
content-1 : text | section-level-2-n

but since we have an arbitrary number of levels the grammar would have
to be infinite.
This is only if you want your grammar to be able to encode that the
content of sections
can include other more deeply nested sections, which in this context
we almost certainly
do not (as you point out).

There is a similar issue with the indentation level in
order to correctly interpret plain lists.


list ::= ('+' string newline)+ sublist?
sublist ::= (indent list)+

I think this captures lists?


Ah yes, I see my mistake here. In order for this to work the parser
has to implement significant whitespace,
so whitespace cannot be parsed into a single token. I think everything
works out after that.

Definitely not able to be represented in EBNF, unless as you say {name} is a 
limited vocabulary.


Darn those pesky open sets!

Re: official orgmode parser

Reply via email to