Hi Asa, My general take is that any active work toward standardization would be premature. At the very least a full implementation outside of Emacs would need to exist. In the absence of that there is little point to standardization. There is ample existing documentation to build a compliant parser (pandoc exists as well ...) and any effort toward standardization right now would be better spent improving the existing implementation or fixing broken ones (e.g. org-ruby).
>From your comments, I would suggest reading through https://orgmode.org/worg/dev/org-syntax.html if you have not done so already. Much of what you mention is already there. If something like standardization is still desired, I would suggest that the proper framing for any such activities would be as improvement and clarification in the documentation, and potentially as formalization of some of the existing behaviors of the system. Org is a fairly stable system, and as others have said, explicitly leaving things open an unspecified would be vital. There are also parts of org (e.g. babel) where the behavior needs to be regularized and made consistent. At the moment those areas need contributors, not standardization. A few more thoughts in line. Best! Tom On Sat, Oct 31, 2020 at 8:22 PM Asa Zeren <asaize...@gmail.com> wrote: > this is impossible. If org catches on before it is standardized, we > end up in the situation of Markdown, with many competing standards and > non-standards. Hence, standardization is essential. The situation for Org is not comparable to markdown. There is a single reference implementation for org at the moment. The codebase is massive. There are many existing parsers for org files. Many are obviously broken since they do not match the reference implementation's behavior. The obviousness is a sign that there is not a need for standardization at this time. Further, there is little risk that another impl will be created without interoperating with the elisp implementation. For example, consider Mauro's use case: being able to get colleagues who do not use Emacs to use Org. I suspect most of the people who would be working on other implementations would be starting from Emacs and would be unlikely to leave. Also unlike markdown, html export is just one tiny part of Org, whereas markdown was implemented repeatedly to allow text input on web pages where people needed to implement parts of html that had not already been specified in markdown. > Standardizing org is much harder than standardizing something like > Markdown, but I think by breaking it down as follows will maximize the > portability of org while not compromising on development of org. See some of my other recent emails. In the short term this is impossible due to the deep dependence on Emacs Lisp. Any outside implementation that is created today would have to implement elisp. Few have been able to do this in over 30 years. Moving beyond elisp requires additional machinery to be added to org to be able to specify other top level langauges. This is not something that is remotely ready for standardization because no one even has a single working implementation yet! > I see three areas of standardization, which I think should be > standardized separately: > - Org DOM No. This is an implementation detail (see below for more). > - Org Syntax This is pretty much done, there are some outstanding points for discussion, but they are about implementation details, not about the contents of the syntax. Also extension of the syntax needs to be open and defined entirely by the elisp implementation, as mentioned by others. > - Org Standard Environments Read https://orgmode.org/worg/dev/org-syntax.html. It will get you up to speed with the existing terminology that is used in the community. > > Org DOM: > The first thing to specify is the org DOM. (Maybe a different name > should be used to avoid confusion with the HTML DOM) This is the > structure of an org-mode document, without the textual > representation. Many org-related tools operate on org documents > without needing to use the textual representation. Specifying the DOM > separately would (a) create a separation of concerns and (b) allow for > better libraries built around org mode. Depending on exactly what you mean by DOM this does not need to be standardized. There are a couple of points that need to be clarified regarding how to treeify the flat list of elements that come out of a parse in order to tie things like associated keywords to the correct elements, but these are quite minimal. The potential rats nest that is trying to standardize a DOM when it is an implementation detail means that I would strongly discourage even thinking about Org in that way. I would even discourage putting too much emphasis on the org-element api which, while extremely useful inside Emacs, is not something that should be standardized because it is a detail peculiar to the elisp implementation. There are cases where certain behaviors, such as how to parse and format footnotes, could be specified, but such behaviors don't require a dom in order to be specified, and adding a DOM to the picture does nothing but complicate the format. Org is a text format. The semantics for interaction with the text format are defined entirely by the text representation (In Emacs there.is.only.buffer). Other semantics, such as export to html and latex, are not something that you would want to try to standardize, you would likely lose friends, enemies, and whatever sanity you had left at the end (see discussion on Mauro's thread about the fact that it is probably just easier to use Emacs directly if you need to export to a certain format in a specific way. It is free software after all.) To the extent that an element tree could be useful, I think it would be as a concept in an implementation guide, not as something formally specified. > Org Syntax: > This would be specifying the mapping between the DOM and the textual > representation, specified in terms of an environment. There is no DOM. Modification to an org document must be made on the text representation otherwise it is meaningless. This isn't html where there is no canonical representation outside the DOM. The text representation of an org document IS the canonical representation (modulo a normalization pass). > Org Standard Environments: > This is how I would specify elements such as #+begin_src..#+end_src > would be specified, as standardized elements of the environment. This > would be structured as a number of individual standard environments, > such as "Source Blocks" or "Standard Header Properties" (specifying > #+title, #+author, etc.) These are well specified already in the worg syntax draft. There are a couple of special cases such as src and example blocks that could be included explicitly in the syntax to facilitate interoperability with parsers for org babel languages. Beyond that, the community already has vocabulary that covers what you describe here, as mentioned above.