gbranden pushed a commit to branch master
in repository groff.
commit a9df692a94a2f5019765658af1c7d697ab3ca074
Author: G. Branden Robinson <[email protected]>
AuthorDate: Fri Aug 30 16:16:50 2024 -0500
[docs]: Update \X & `char` warning category descs.
* doc/groff.texi.in (Postprocessor Access):
* man/groff.7.man (Escape sequence short reference):
* man/groff_diff.7.man (Escape sequences): Update discussion of `\X`
escape sequence behavior.
* doc/groff.texi.in (Warnings):
* src/roff/troff/troff.1.man (Warnings): Update description of `char`
warning category.
Continues fixing Savannah #63074.
---
doc/groff.texi.in | 41 +++++++++++--------
man/groff.7.man | 68 ++++++++++++++++---------------
man/groff_diff.7.man | 99 ++++++++++++++++++++++++++++++----------------
src/roff/troff/troff.1.man | 3 +-
4 files changed, 127 insertions(+), 84 deletions(-)
diff --git a/doc/groff.texi.in b/doc/groff.texi.in
index 5d8f89e3c..6f9d81dfc 100644
--- a/doc/groff.texi.in
+++ b/doc/groff.texi.in
@@ -16594,21 +16594,27 @@ is stripped to allow embedding of leading spaces.
@ifinfo
@cindex @code{\@r{<colon>}}, in device control commands
@end ifinfo
-By contrast, within @code{\X} arguments, the escape sequences @code{\&},
-@code{\)}, @code{\%}, and @code{\:} are ignored; @code{\@key{SPC}} and
-@code{\~} are converted to single space characters; and a self-escaped
-escape character is output as a backslash @code{\}. So that the basic
-Latin subset of the Unicode character set@footnote{that is, @w{ISO
-646:1991 IRV} (US-@acronym{ASCII})} can be reliably encoded in device
-control commands, seven special character escape sequences (@samp{\-},
-@samp{\[aq]}, @samp{\[dq]}, @samp{\[ga]}, @samp{\[ha]}, @samp{\[rs]},
-and @samp{\[ti]}) are mapped to basic Latin characters; see the
-@cite{groff_char@r{(7)}} man page. For this transformation, character
-translations and special character definitions are
-ignored.@footnote{They are bypassed because these parameters are not
-rendered as glyphs in the output; instead, they remain abstract
-characters---in a PDF bookmark or a URL, for example.} The use of any
-other escape sequence in @code{\X} parameters is normally an error.
+By contrast, within @code{\X} arguments, GNU @command{troff} converts
+several ordinary characters that typeset as non-basic Latin code points
+to code points outside that range to avoid confusion when these
+characters are used in ways that are ultimately visible, as in tag names
+for PDF bookmarks, which can appear in a viewer's navigation pane.
+These ordinary characters are @samp{'}, @samp{-}, @samp{^}, @samp{`},
+and @samp{~}; others are written as-is.
+
+Special characters that typeset as Unicode basic Latin characters are
+translated to basic Latin characters accordingly. For this
+transformation, character translations and definitions are ignored.
+
+So that any Unicode code point can be represented in device extension
+commands, for example in an author's name in document metadata or as a
+usefully named bookmark or hyperlink anchor, GNU @command{troff} maps
+other special characters to Unicode special character notation.
+@xref{Using Symbols}.
+
+Special characters without a Unicode representation, and escape
+sequences that do not interpolate a sequence of ordinary and/or special
+characters, produce warnings in category @samp{char}.
@kindex use_charnames_in_special
@cindex @file{DESC} file, and @code{use_charnames_in_special} keyword
@@ -17327,8 +17333,9 @@ circumstances.
@table @samp
@item char
@itemx 1
-No mounted font defines a glyph for the requested character. This
-category is enabled by default.
+No mounted font defines a glyph for the requested character, or input
+could not be encoded for device-independent output. This category is
+enabled by default.
@item number
@itemx 2
diff --git a/man/groff.7.man b/man/groff.7.man
index 20ddb6bac..3a7a59897 100644
--- a/man/groff.7.man
+++ b/man/groff.7.man
@@ -5528,43 +5528,47 @@ Write
.I contents
to
.I @g@troff
-output as a device control command.
+output as a device extension command.
.
-Within
-.IR anything ,
-the escape sequences
-.BR \[rs]& ,
-.BR \[rs]) ,
-.BR \[rs]% ,
-and
-.B \[rs]:
-are ignored;
-.BI \[rs] space
-and
-.B \[rs]\[ti]
-are converted to single space characters;
+GNU
+.I troff \" GNU
+converts several ordinary characters that typeset as non-basic Latin
+code points to code points outside that range
+to avoid confusion when these characters are used in ways that are
+ultimately visible,
+as in tag names for PDF bookmarks,
+which can appear in a viewer's navigation pane.
+.
+These ordinary characters are
+.RB \[lq]\| \[aq] \|\[rq],
+.RB \[lq]\| \- \|\[rq],
+.RB \[lq]\| \[ha] \|\[rq],
+.RB \[lq]\| \[ga] \|\[rq],
and
-.B \[rs]\[rs]
-has its escape character stripped.
+.RB \[lq]\| \[ti] \|\[rq];
+others are written as-is.
.
-So that the basic Latin subset of the Unicode character
-set can be reliably encoded in
-.I anything,
-the special character escape sequences
-.BR \[rs]\- ,
-.BR \[rs][aq] ,
-.BR \[rs][dq] ,
-.BR \[rs][ga] ,
-.BR \[rs][ha] ,
-.BR \[rs][rs] ,
-and
-.B \[rs][ti]
-are mapped to basic Latin characters;
-see
-.MR groff_char @MAN7EXT@ .
+.
+.IP
+Special characters that typeset as Unicode basic Latin characters
+are translated to basic Latin characters accordingly.
.
For this transformation,
-character translations and special character definitions are ignored.
+character translations and definitions are ignored.
+.
+So that any Unicode code point can be represented in device extension
+commands,
+for example in an author's name in document metadata
+or as a usefully named bookmark or hyperlink anchor,
+GNU
+.I troff \" GNU
+maps other special characters to Unicode special character notation.
+.
+Special characters without a Unicode representation,
+and escape sequences that do not interpolate a sequence
+of ordinary and/or special characters,
+produce warnings in category
+.RB \[lq] char \[rq].
.
.TP
.ESC Y n
diff --git a/man/groff_diff.7.man b/man/groff_diff.7.man
index 6c98798b6..6593ed730 100644
--- a/man/groff_diff.7.man
+++ b/man/groff_diff.7.man
@@ -1087,55 +1087,86 @@ as returned by
is interpreted even in copy mode.
.
.
+.\" TODO: When we get this giant headache generalized and adapted to the
+.\" `\!` escape sequence and `device`, `output`, `cf`, and `trf`
+.\" requests, move this discussion into a dedicated subsection above.
.TP
.BI \[rs]X\[aq] contents \[aq]
GNU
.I troff \" GNU
transforms the argument to the device control escape sequence to avoid
leaking to device-independent output data that are unrepresentable in
-that format.
+that format,
+and to address the problem of expressing character code points outside
+of the Unicode basic Latin range in an output file format that restricts
+itself to that range.
+.
+(See subsection \[lq]Basic Latin\[rq] of
+.MR groff_char @MAN7EXT@ .)
+.
+The typesetting of such characters is a problem long-solved in
+device-independent
+.I troff \" generic
+by the
+.RB \[lq] C \[rq]
+command;
+see
+.MR groff_out @MAN5EXT@ .
+.
+The expression of such characters in other contexts,
+such as device extension commands,
+was not addressed by the same design.
+.
+Where possible,
+GNU
+.I troff \" GNU
+represents such characters in device-independent
+but non-typesetting contexts using its notation
+for Unicode special character escape sequences;
+see subsection \[lq]Special character escape forms\[rq] of
+.MR groff_char @MAN7EXT@ .
.
.
.IP
-Within
-.I contents,
-the escape sequences
-.BR \[rs]& ,
-.BR \[rs]) ,
-.BR \[rs]% ,
-and
-.B \[rs]:
-are ignored;
-.BI \[rs] space
+GNU
+.I troff \" GNU
+converts several ordinary characters that typeset as non-basic Latin
+code points to code points outside that range
+to avoid confusion when these characters are used in ways that are
+ultimately visible,
+as in tag names for PDF bookmarks,
+which can appear in a viewer's navigation pane.
+.
+These ordinary characters are
+.RB \[lq]\| \[aq] \|\[rq],
+.RB \[lq]\| \- \|\[rq],
+.RB \[lq]\| \[ha] \|\[rq],
+.RB \[lq]\| \[ga] \|\[rq],
and
-.B \[rs]\[ti]
-are converted to single space characters;
-and a self-escaped escape character is output as a backslash
-.BR \[rs] .
+.RB \[lq]\| \[ti] \|\[rq];
+others are written as-is.
.
-So that the basic Latin subset of the Unicode character set\[em]\c
-that is,
-ISO\~646:1991\~IRV (US-ASCII)\[em]\c
-can be reliably encoded in
-.I contents,
-the special character escape sequences
-.BR \[rs]\- ,
-.BR \[rs][aq] ,
-.BR \[rs][dq] ,
-.BR \[rs][ga] ,
-.BR \[rs][ha] ,
-.BR \[rs][rs] ,
-and
-.B \[rs][ti]
-are mapped to basic Latin characters;
-see
-.MR groff_char @MAN7EXT@ .
+.
+.IP
+Special characters that typeset as Unicode basic Latin characters
+are translated to basic Latin characters accordingly.
.
For this transformation,
character translations and definitions are ignored.
.
-.I @g@troff
-discards other escape sequences with an error diagnostic.
+So that any Unicode code point can be represented in device extension
+commands,
+for example in an author's name in document metadata
+or as a usefully named bookmark or hyperlink anchor,
+GNU
+.I troff \" GNU
+maps other special characters to Unicode special character notation.
+.
+Special characters without a Unicode representation,
+and escape sequences that do not interpolate a sequence
+of ordinary and/or special characters,
+produce warnings in category
+.RB \[lq] char \[rq].
.
.
.br
diff --git a/src/roff/troff/troff.1.man b/src/roff/troff/troff.1.man
index cd9ef16a6..10f4d7929 100644
--- a/src/roff/troff/troff.1.man
+++ b/src/roff/troff/troff.1.man
@@ -587,7 +587,8 @@ This category is enabled by default.
.
.TP
.BR char "\t1"
-No mounted font defines a glyph for the requested character.
+No mounted font defines a glyph for the requested character,
+or input could not be encoded for device-independent output.
.
This category is enabled by default.
.
_______________________________________________
Groff-commit mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/groff-commit