gbranden pushed a commit to branch master
in repository groff.
commit 18ed47436cf3be79f706af6046bf9789f26c95a7
Author: G. Branden Robinson <[email protected]>
AuthorDate: Fri Nov 28 16:56:59 2025 -0600
[doc,man]: Annotate internal design challenge.
---
doc/groff.texi.in | 15 +++++++++++++++
man/groff.7.man | 15 +++++++++++++++
2 files changed, 30 insertions(+)
diff --git a/doc/groff.texi.in b/doc/groff.texi.in
index e6b0a94e6..fcc3b23bd 100644
--- a/doc/groff.texi.in
+++ b/doc/groff.texi.in
@@ -7806,6 +7806,21 @@ expressions; see below.
@cindex @code{\u}, as delimiter
The following escape sequences don't take arguments
and thus are allowed as delimiters:
+@c That explanation is kind of BS; it's actually because their token
+@c types aren't shared with ones that are parameterized in input.
+@c Allowing `\0`, `\^`, and `\|` also allows the delimited `\h`, which
+@c presumably was not intentional. Allowing special characters as
+@c delimiters, which very much _was_ deliberate (see their use in tbl
+@c and eqn) also allows the delimited `\C`, which is a wart that makes
+@c the foregoing "thus" a lie. But expecting the reader to have a
+@c command of GNU troff's token types, which _should_ be a mere
+@c implementation detail, is a tall order. When we make GNU troff
+@c handle UTF-8/Unicode internally we'll have a practically unlimited
+@c space for token types (because the internal character type will go
+@c from 8 to 32 bits wide), and can distinguish some that have been
+@c collapsed to date. That would in turn allow us to make
+@c "unformatting" and "asciification" more representative of the
+@c original input.
@code{\@key{SPC}},
@code{\%},
@code{\|},
diff --git a/man/groff.7.man b/man/groff.7.man
index c627540b5..4cfb2e39b 100644
--- a/man/groff.7.man
+++ b/man/groff.7.man
@@ -1863,6 +1863,21 @@ see below.
.P
The following escape sequences don't take arguments
and thus are allowed as delimiters:
+.\" That explanation is kind of BS; it's actually because their token
+.\" types aren't shared with ones that are parameterized in input.
+.\" Allowing `\0`, `\^`, and `\|` also allows the delimited `\h`, which
+.\" presumably was not intentional. Allowing special characters as
+.\" delimiters, which very much _was_ deliberate (see their use in tbl
+.\" and eqn) also allows the delimited `\C`, which is a wart that makes
+.\" the foregoing "thus" a lie. But expecting the reader to have a
+.\" command of GNU troff's token types, which _should_ be a mere
+.\" implementation detail, is a tall order. When we make GNU troff
+.\" handle UTF-8/Unicode internally we'll have a practically unlimited
+.\" space for token types (because the internal character type will go
+.\" from 8 to 32 bits wide), and can distinguish some that have been
+.\" collapsed to date. That would in turn allow us to make
+.\" "unformatting" and "asciification" more representative of the
+.\" original input.
.BI \[rs] space\c
,
.BR \[rs]% ,
_______________________________________________
groff-commit mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/groff-commit