Inconsistent text markup handling when double-nesting markers

Tom Alexander Mon, 09 Oct 2023 16:03:29 -0700

I used the following test document:
```
__foo__

**foo**
```


I'd expect the two to behave the same but the first one parses as:
```
(paragraph
  "_"
  (subscript "foo")
  "__"
  )
```

Whereas the second parses as:
```
(paragraph
  (bold
    (bold
      "foo"
      )
    )
  )
```

This pattern happens in worg at [2]

Looking at the description for text markup in the syntax document[1], I don't 
see any reason the first wouldn't be parsed as an underline:

1. PRE: valid because it is the beginning of a line
2. MARKER: valid underscore
3. CONTENTS: valid. Series of objects from standard set includes both subscript 
and text markup, so regardless of how we parse the interior, its valid. Also 
cannot begin or end with whitespace but there is no whitespace in the CONTENTS.
4. MARKER: valid underscore
5. POST: Only valid if we extend the underline to the 2nd underscore so it ends 
at the end of the line. But the 2nd line shows us that having copies of the 
marker inside the CONTENTS is fine so I see two possible expected parses of the 
CONTENTS:
    4a. (underline "foo")
    4b. ((subscript "foo") (plain-text "_"))

I also ran the following test document to further prove that having copies of 
the marker inside the CONTENTS is fine:
```
*foo*bar*
```
which parses as (bold "foo*bar")

So the only way the top line would fail to parse as an underline is if it 
matched the first closing underscore as closing the underline, but that would 
be invalid because underscore is not a valid POST character and invalid copies 
of the closing marker are ignored as proven by both "**foo**" and "*foo*bar*".


[1] https://orgmode.org/worg/org-syntax.html#Emphasis_Markers
[2] 
https://git.sr.ht/~bzg/worg/tree/ba6cda890f200d428a5d68e819eef15b5306055f/org-contrib/babel/intro.org#L117

--
Tom Alexander
pgp: https://fizz.buzz/pgp.asc

Inconsistent text markup handling when double-nesting markers

Reply via email to