Re: Org markup and non-ASCII punctuation (was: org parser and priorities of inline elements)

Tom Gillespie Mon, 17 Jul 2023 22:41:10 -0700

> We might probably generalize to
> PRE  = Zs Zl Pc Pd Ps Pi ' "
> POST = Zs Zl Pc Pd Pe Pf . ; : ! ? ' " \ [


If this works I think it is reasonable. We might want to
specify what to do in cases where an org implementation
might not fully support unicode, and might want to do a
review of related issues in syntax with respect to ascii
vs unicode, because iirc there is some ambiguity in
the current syntax doc.

For example, I'm pretty sure that I'm mixing and matching
unicode and ascii whitespace in the tokenizer I have in Racket.

> Though we need to take care excluding zero-width spaces.

Ya, I removed a comment to this effect in the paragraph about
the usual alternate solution.

> Emacs does not support them though (yet?).

Racket has full support for the latest unicode standards iirc,
so I will see if I can leverage that support for testing in laundry.

> At the end, it is the current ASCII limitation plus partially arbitrary
> choice of boundaries that keep some users confused (we are getting bug
> reports about confusing markup from time to time).

Ya, it would be good to try to generalize the affordance if possible since
users of text in non-ascii languages have certain valid expectations. Hopefully,
the unicode consortium has managed to cover the categories we need.

Re: Org markup and non-ASCII punctuation (was: org parser and priorities of inline elements)

Reply via email to