[groff] 49/92: [doc,man]: Annotate internal design challenge.

G. Branden Robinson Fri, 28 Nov 2025 17:44:24 -0800

gbranden pushed a commit to branch master
in repository groff.

commit 18ed47436cf3be79f706af6046bf9789f26c95a7
Author: G. Branden Robinson <[email protected]>
AuthorDate: Fri Nov 28 16:56:59 2025 -0600


    [doc,man]: Annotate internal design challenge.
---
 doc/groff.texi.in | 15 +++++++++++++++
 man/groff.7.man   | 15 +++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/doc/groff.texi.in b/doc/groff.texi.in
index e6b0a94e6..fcc3b23bd 100644
--- a/doc/groff.texi.in
+++ b/doc/groff.texi.in
@@ -7806,6 +7806,21 @@ expressions; see below.
 @cindex @code{\u}, as delimiter
 The following escape sequences don't take arguments
 and thus are allowed as delimiters:
+@c That explanation is kind of BS; it's actually because their token
+@c types aren't shared with ones that are parameterized in input.
+@c Allowing `\0`, `\^`, and `\|` also allows the delimited `\h`, which
+@c presumably was not intentional.  Allowing special characters as
+@c delimiters, which very much _was_ deliberate (see their use in tbl
+@c and eqn) also allows the delimited `\C`, which is a wart that makes
+@c the foregoing "thus" a lie.  But expecting the reader to have a
+@c command of GNU troff's token types, which _should_ be a mere
+@c implementation detail, is a tall order.  When we make GNU troff
+@c handle UTF-8/Unicode internally we'll have a practically unlimited
+@c space for token types (because the internal character type will go
+@c from 8 to 32 bits wide), and can distinguish some that have been
+@c collapsed to date.  That would in turn allow us to make
+@c "unformatting" and "asciification" more representative of the
+@c original input.
 @code{\@key{SPC}},
 @code{\%},
 @code{\|},
diff --git a/man/groff.7.man b/man/groff.7.man
index c627540b5..4cfb2e39b 100644
--- a/man/groff.7.man
+++ b/man/groff.7.man
@@ -1863,6 +1863,21 @@ see below.
 .P
 The following escape sequences don't take arguments
 and thus are allowed as delimiters:
+.\" That explanation is kind of BS; it's actually because their token
+.\" types aren't shared with ones that are parameterized in input.
+.\" Allowing `\0`, `\^`, and `\|` also allows the delimited `\h`, which
+.\" presumably was not intentional.  Allowing special characters as
+.\" delimiters, which very much _was_ deliberate (see their use in tbl
+.\" and eqn) also allows the delimited `\C`, which is a wart that makes
+.\" the foregoing "thus" a lie.  But expecting the reader to have a
+.\" command of GNU troff's token types, which _should_ be a mere
+.\" implementation detail, is a tall order.  When we make GNU troff
+.\" handle UTF-8/Unicode internally we'll have a practically unlimited
+.\" space for token types (because the internal character type will go
+.\" from 8 to 32 bits wide), and can distinguish some that have been
+.\" collapsed to date.  That would in turn allow us to make
+.\" "unformatting" and "asciification" more representative of the
+.\" original input.
 .BI \[rs] space\c
 ,
 .BR \[rs]% ,

_______________________________________________
groff-commit mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/groff-commit

[groff] 49/92: [doc,man]: Annotate internal design challenge.

Reply via email to