gbranden pushed a commit to branch master
in repository groff.

commit 40b8ef0a7e4abfeb131ed0ef6153e8eeca705fa8
Author: G. Branden Robinson <[email protected]>
AuthorDate: Sun Aug 4 02:44:32 2024 -0500

    [docs]: Refer to character encodings consistently.
    
    * Say "US-ASCII", not just "ASCII".
    * In Texinfo, mark "ASCII" as an acronym ("US" is not).
    * In Texinfo, favor use of @w over @tie command with phrases naming
      character encodings, for source readability.
    * In Texinfo, mark only the "ECMA" part of "ECMA-6" (for instance) as an
      acronym.
    * Refer to US-ASCII in full as "ISO 646:1991 IRV (US-ASCII)", with break
      suppression in the standard identifier.
    * Suppress breaks between "ISO" or "USAS" and encoding names (such as
      "ISO Latin-1").
    * Stop putting "ASCII" in scare quotes when we're referring to it
      precisely, as with USAS X3.4-1968.
    * ...or refer to it precisely, as with "US-ASCII", which implies
      ISO 646:1991 IRV.
    * Use em-dashes to avoid nested parentheticals.
    * In groff_tmac(5), supply the part numbers of the ISO 8859 standards
      alongside the character set names.
    * Favor referring to ISO character set names over their standard
      identifiers with part number.
---
 doc/groff.texi.in                 | 86 +++++++++++++++++++--------------------
 man/groff.7.man                   | 13 +++---
 man/groff_char.7.man              | 17 ++++----
 man/groff_diff.7.man              | 12 +++---
 man/groff_tmac.5.man              | 13 ++++--
 src/devices/grolj4/grolj4.1.man   |  2 +-
 src/devices/grops/grops.1.man     |  2 +-
 src/preproc/preconv/preconv.1.man | 14 ++++---
 src/roff/groff/groff.1.man        |  4 +-
 src/utils/grog/grog.1.man         |  9 ++--
 tmac/groff_mdoc.7.man             |  4 +-
 11 files changed, 91 insertions(+), 85 deletions(-)

diff --git a/doc/groff.texi.in b/doc/groff.texi.in
index 17fee6019..27c178ce8 100644
--- a/doc/groff.texi.in
+++ b/doc/groff.texi.in
@@ -1256,22 +1256,22 @@ document.
 @cindex ISO@tie{}646 output encoding
 @cindex output encoding, @acronym{ASCII}
 @cindex output encoding, ISO@tie{}646
-For typewriter-like devices using the (7-bit) @acronym{ASCII}
-(ISO@tie{}646) character set.
+For typewriter-like devices using the (7-bit) @w{ISO 646:1991 IRV}
+(US-@acronym{ASCII}) character set.
 
 @item latin1
-@cindex encoding, output, @w{Latin-1} (ISO @w{8859-1})
-@cindex @w{Latin-1} (ISO @w{8859-1}) output encoding
-@cindex ISO @w{8859-1} (@w{Latin-1}) output encoding
-@cindex output encoding, @w{Latin-1} (ISO @w{8859-1})
-For typewriter-like devices that support the @w{Latin-1}
-(ISO@tie{}@w{8859-1}) character set.
+@cindex encoding, output, @w{ISO Latin-1} (@w{8859-1})
+@cindex @w{Latin-1} (@w{ISO 8859-1}) output encoding
+@cindex @w{ISO Latin-1} (@w{8859-1}) output encoding
+@cindex output encoding, @w{ISO Latin-1} (@w{8859-1})
+For typewriter-like devices that support the @w{ISO Latin-1} (8859-1)
+character set.
 
 @item utf8
 @cindex encoding, output, @w{UTF-8}
 @cindex @w{UTF-8} output encoding
 @cindex output encoding, @w{UTF-8}
-For typewriter-like devices that use the Unicode (ISO@tie{}10646)
+For typewriter-like devices that use the @w{ISO 10646} (Unicode)
 character set with @w{UTF-8} encoding.
 
 @item lj4
@@ -5625,20 +5625,20 @@ other hand, must be in one of two encodings it can 
recognize.
 
 @table @code
 @item latin1
-@cindex encoding, input, @w{Latin-1} (ISO @w{8859-1})
-@cindex @w{Latin-1} (ISO @w{8859-1}), input encoding
-@cindex ISO @w{8859-1} (@w{Latin-1}), input encoding
-@cindex input encoding, @w{Latin-1} (ISO @w{8859-1})
+@cindex encoding, input, @w{ISO Latin-1} (@w{8859-1})
+@cindex @w{Latin-1} (@w{ISO 8859-1}) input encoding
+@cindex @w{ISO Latin-1} (@w{8859-1}) input encoding
+@cindex input encoding, @w{ISO Latin-1} (@w{8859-1})
 @pindex latin1.tmac
 ISO @w{Latin-1} is an encoding for Western European languages.
 @end table
 
 @noindent
-Any document that is encoded in ISO 646:1991 (a descendant of USAS
-@w{X3.4-1968} or ``US-ASCII''), or, equivalently, uses only code points
-from the ``C0 Controls'' and ``Basic Latin'' parts of the Unicode
-character set is also a valid ISO @w{Latin-1} document; the standards
-are interchangeable in their first 128 code points.@footnote{The
+Any document that is encoded in @w{ISO 646:1991 IRV}
+(US-@acronym{ASCII}), or, equivalently, uses only code points from the
+``C0 Controls'' and ``Basic Latin'' parts of the Unicode character set
+is also a valid ISO @w{Latin-1} document; the standards are
+interchangeable in their first 128 code points.@footnote{The
 @emph{semantics} of certain punctuation code points have gotten stricter
 with the successive standards, a cause of some frustration among man
 page writers; see @cite{groff_char@r{(7)}}.}
@@ -5662,10 +5662,10 @@ obtained via special character escape sequences; see
 @cite{groff_char@r{(7)}}.}
 
 @item latin2
-@cindex encoding, input, @w{Latin-2} (ISO @w{8859-2})
-@cindex @w{Latin-2} (ISO @w{8859-2}), input encoding
-@cindex ISO @w{8859-2} (@w{Latin-2}), input encoding
-@cindex input encoding, @w{Latin-2} (ISO @w{8859-2})
+@cindex encoding, input, @w{ISO Latin-2} (@w{8859-2})
+@cindex @w{Latin-2} (@w{ISO 8859-2}) input encoding
+@cindex @w{ISO Latin-2} (@w{8859-2}) input encoding
+@cindex input encoding, @w{ISO Latin-2} (@w{8859-2})
 @pindex latin2.tmac
 To use ISO @w{Latin-2}, an encoding for Central and Eastern European
 languages, invoke @w{@samp{.mso latin2.tmac}} at the beginning of your
@@ -5673,20 +5673,20 @@ document or supply @samp{-m latin2} as a command-line 
argument to
 @code{groff}.
 
 @item latin5
-@cindex encoding, input, @w{Latin-5} (ISO @w{8859-9})
-@cindex @w{Latin-5} (ISO @w{8859-9}), input encoding
-@cindex ISO @w{8859-9} (@w{Latin-5}), input encoding
-@cindex input encoding, @w{Latin-5} (ISO @w{8859-9})
+@cindex encoding, input, @w{ISO Latin-5} (@w{8859-9})
+@cindex @w{Latin-5} (@w{ISO 8859-9}) input encoding
+@cindex @w{ISO Latin-5} (@w{8859-9}) input encoding
+@cindex input encoding, @w{ISO Latin-5} (@w{8859-9})
 @pindex latin5.tmac
 To use ISO @w{Latin-5}, an encoding for the Turkish language, invoke
 @w{@samp{.mso latin5.tmac}} at the beginning of your document or
 supply @samp{-m latin5} as a command-line argument to @code{groff}.
 
 @item latin9
-@cindex encoding, input, @w{Latin-9} (ISO @w{8859-15})
-@cindex @w{Latin-9} (ISO @w{8859-15}), input encoding
-@cindex ISO @w{8859-15} (@w{Latin-9}), input encoding
-@cindex input encoding, @w{Latin-9} (ISO @w{8859-15})
+@cindex encoding, input, @w{ISO Latin-9} (@w{8859-15})
+@cindex @w{Latin-9} (@w{ISO 8859-15}) input encoding
+@cindex @w{ISO Latin-9} (@w{8859-15}) input encoding
+@cindex input encoding, @w{ISO Latin-9} (@w{8859-15})
 @pindex latin9.tmac
 ISO @w{Latin-9} succeeds @w{Latin-1}; it includes a Euro sign and better
 coverage for French.  To use this encoding, invoke @w{@samp{.mso
@@ -8771,7 +8771,7 @@ The automatic placement of hyphens in words is determined 
by
 @dfn{pattern files}, which are derived from @TeX{} and available for
 several languages.  These files are named @file{hyphen.@var{xx}} (for
 the patterns) and @file{hyphenex.@var{xx}} (for a list of exceptions in
-languages that require them) where @var{xx} is an ISO@tie{}639 language
+languages that require them) where @var{xx} is an @w{ISO 639} language
 code; see the table below.
 @c XXX: "den" and "det" aren't ISO 639 codes.  Is there a POSIX locale
 @c modifier for these variations, like de@alt and de@recht or similar?
@@ -10444,7 +10444,7 @@ document writes its first glyph.
 @cindex cell, character, attributes
 Terminals cannot change font families and lack special fonts.  They
 support style changes by overstriking, or by altering
-ISO@tie{}6429/ECMA-48 @dfn{graphic renditions} (character cell
+@w{ISO 6429}/ECMA-48 @dfn{graphic renditions} (character cell
 attributes).
 @c END Keep (roughly) parallel with section "Using fonts" of groff(7).
 
@@ -12416,7 +12416,7 @@ necessarily the same.  For the @code{dvi}, @code{html}, 
@code{pdf},
 @code{ps}, and @code{xhtml} output devices, GNU @code{troff}
 automatically loads a macro file defining many color names at startup.
 By the same mechanism, the devices supported by @code{grotty} recognize
-the eight standard ISO@tie{}6429/EMCA-48 color names.@footnote{These
+the eight standard @w{ISO 6429}/ECMA-48 color names.@footnote{These
 are known vulgarly as ``ANSI'' colors, after its X3.64 standard, now
 withdrawn.}
 
@@ -16608,14 +16608,14 @@ By contrast, within @code{\X} arguments, the escape 
sequences @code{\&},
 @code{\)}, @code{\%}, and @code{\:} are ignored; @code{\@key{SPC}} and
 @code{\~} are converted to single space characters; and a self-escaped
 escape character is output as a backslash @code{\}.  So that the basic
-Latin subset of the Unicode character set@footnote{that is,
-ISO@tie{}646:1991-IRV or, popularly, ``US-ASCII''} can be reliably
-encoded in device control commands, seven special character escape
-sequences (@samp{\-}, @samp{\[aq]}, @samp{\[dq]}, @samp{\[ga]},
-@samp{\[ha]}, @samp{\[rs]}, and @samp{\[ti]}) are mapped to basic Latin
-characters; see the @cite{groff_char@r{(7)}} man page.  For this
-transformation, character translations and special character definitions
-are ignored.@footnote{They are bypassed because these parameters are not
+Latin subset of the Unicode character set@footnote{that is, @w{ISO
+646:1991 IRV} (US-@acronym{ASCII})} can be reliably encoded in device
+control commands, seven special character escape sequences (@samp{\-},
+@samp{\[aq]}, @samp{\[dq]}, @samp{\[ga]}, @samp{\[ha]}, @samp{\[rs]},
+and @samp{\[ti]}) are mapped to basic Latin characters; see the
+@cite{groff_char@r{(7)}} man page.  For this transformation, character
+translations and special character definitions are
+ignored.@footnote{They are bypassed because these parameters are not
 rendered as glyphs in the output; instead, they remain abstract
 characters---in a PDF bookmark or a URL, for example.}  The use of any
 other escape sequence in @code{\X} parameters is normally an error.
@@ -17910,8 +17910,8 @@ The other correct way, appropriate in contexts 
independent of the
 backslash's common use as a @code{troff} escape character---perhaps in
 discussion of character sets or other programming languages---is
 the character escape @code{\(rs} or @code{\[rs]}, for ``reverse
-solidus'', from its name in the @acronym{ECMA-6} (@acronym{ISO/IEC} 646)
-standard.@footnote{The @code{rs} special character identifier was not
+solidus'', from its name in the @acronym{ECMA}-6 and @w{ISO 10646}
+standards.@footnote{The @code{rs} special character identifier was not
 defined in @acronym{AT&T} @code{troff}'s font description files, but is
 in those of its lineal descendant, Heirloom Doctools @code{troff}, as of
 the latter's 060716 release (July 2006).}
diff --git a/man/groff.7.man b/man/groff.7.man
index e7111df7a..2dd49a4fc 100644
--- a/man/groff.7.man
+++ b/man/groff.7.man
@@ -339,9 +339,9 @@ is organized into lines separated by the Unix newline 
character
 and must be in the character encoding it recognizes:
 ISO\~Latin-1 (8859-1).
 .
-Use of ISO\~646-1991:IRV (\[lq]US-ASCII\[rq]) or (equivalently) the
-\[lq]Basic Latin\[rq]
-subset of ISO\~10646 (\[lq]Unicode\[rq]) is recommended;
+We recommend use of ISO\~646:1991\~IRV (US-ASCII)
+or (equivalently) the Basic Latin subset
+of ISO\~10646 (Unicode);
 see
 .MR groff_char @MAN7EXT@ .
 .
@@ -8033,11 +8033,8 @@ are converted to single space characters;
 and a self-escaped escape character is output as a backslash
 .BR \[rs] .
 .
-So that the basic Latin subset of the Unicode character set
-(that is,
-ISO\~646:1991-IRV or,
-popularly,
-\[lq]US-ASCII\[rq])
+So that the basic Latin subset of the Unicode character set,
+ISO\~646:1991\~IRV (US-ASCII),
 can be reliably encoded in device control commands,
 seven special character escape sequences
 .RB (\[lq] \[rs]\- \[rq],
diff --git a/man/groff_char.7.man b/man/groff_char.7.man
index f516501ad..633172711 100644
--- a/man/groff_char.7.man
+++ b/man/groff_char.7.man
@@ -75,13 +75,14 @@ an output device renders
 .I glyphs.
 .
 .IR groff 's
-input character set is restricted to that defined by the ISO Latin-1
-(ISO 8859-1)
+input character set is restricted to that defined by the ISO\~Latin-1
+(ISO\~8859-1)
 standard.
 .
 For ease of document maintenance in UTF-8 environments,
-it is advisable to use only the Unicode basic Latin code points,
-a subset of all of the foregoing historically referred to as \%US-ASCII,
+it is advisable to use only the Unicode basic Latin code points;
+these correspond to ISO\~646:1991\~IRV (US-ASCII),
+a subset of all of the foregoing
 which has only 94 visible,
 printable code points.
 .\" In groff, 0x20 SP is mapped to a space node, not a glyph node, and
@@ -296,7 +297,7 @@ the U.S.\& government.
 .
 Further,
 the prevailing character encoding standard in the 1970s,
-USAS X3.4-1968 (\[lq]ASCII\[rq]),
+USAS\~X3.4-1968 (ASCII),
 deliberately supported semantic ambiguity at some code points,
 and outright substitution at several others,
 to suit the localization demands of various national standards bodies.
@@ -386,7 +387,7 @@ _
 .P
 The hyphen-minus is a particularly unfortunate case of overloading.
 .
-Its awkward name in ISO 8859 and later standards reflects the many
+Its awkward name in ISO\~8859 and later standards reflects the many
 distinguishable purposes to which it had already been put by the 1980s,
 including
 a hyphen,
@@ -509,7 +510,7 @@ falling back to basic Latin glyphs only when necessary.
 ISO 646 is a seven-bit code encoding 128 code points;
 eight-bit codes are twice the size.
 .
-ISO Latin-1 (8859-1) allocated the additional space to what
+ISO\~Latin-1 (8859-1) allocated the additional space to what
 Unicode calls \[lq]C1 controls\[rq]
 (control characters)
 and the \[lq]Latin-1 supplement\[rq].
@@ -822,7 +823,7 @@ as ways to input \[lq]caf\['e]\[rq].
 (Due to its legacy 8-bit encoding compatibility,
 at present it also accepts
 .RB \[lq]caf \[rs][u00E9] \[rq]
-on ISO Latin-1 systems.)
+on ISO\~Latin-1 systems.)
 .
 .
 .TP
diff --git a/man/groff_diff.7.man b/man/groff_diff.7.man
index 4c08afcb3..78250d8ac 100644
--- a/man/groff_diff.7.man
+++ b/man/groff_diff.7.man
@@ -1086,11 +1086,9 @@ are converted to single space characters;
 and a self-escaped escape character is output as a backslash
 .BR \[rs] .
 .
-So that the basic Latin subset of the Unicode character set
-(that is,
-ISO\~646:1991-IRV or,
-popularly,
-\[lq]US-ASCII\[rq])
+So that the basic Latin subset of the Unicode character set\[em]\c
+that is,
+ISO\~646:1991\~IRV (US-ASCII)\[em]\c
 can be reliably encoded in
 .I contents,
 the special character escape sequences
@@ -1606,7 +1604,7 @@ and
 .I Unformat
 the diversion
 .I div
-in a way such that Unicode basic Latin (ASCII) characters,
+in a way such that Unicode basic Latin (US-ASCII) characters,
 characters translated with the
 .B trin
 request,
@@ -4801,7 +4799,7 @@ works fine.
 .
 Use visible characters as delimiters in GNU
 .IR troff , \" GNU
-not \[lq]ASCII\[rq] controls like BEL (Control+G).
+not US-ASCII controls like BEL (Control+G).
 .
 The implementation of
 .B \[rs]$@
diff --git a/man/groff_tmac.5.man b/man/groff_tmac.5.man
index 50de61e41..60297686f 100644
--- a/man/groff_tmac.5.man
+++ b/man/groff_tmac.5.man
@@ -124,8 +124,9 @@ It can even be empty.
 .
 .
 .P
-Encode macro files in ISO 646 (\[lq]ASCII\[rq])
-or ISO Latin-1 (8859-1).
+Encode macro files in
+ISO\~646:1991\~IRV (US-ASCII)
+or ISO\~Latin-1 (8859-1).
 .
 To prepare for a future
 .I groff
@@ -461,11 +462,17 @@ corresponding macro file.
 .I latin5
 .TQ
 .I latin9
-support the ISO\~8859 Latin-1,
+support the ISO\~Latin-1,
 Latin-2,
 Latin-5,
 and
 Latin-9 encodings
+(8859-1,
+8859-2,
+8859-9,
+and
+8859-15,
+respectively).
 .
 .
 .TP
diff --git a/src/devices/grolj4/grolj4.1.man b/src/devices/grolj4/grolj4.1.man
index a4762ae8e..617e0e4c2 100644
--- a/src/devices/grolj4/grolj4.1.man
+++ b/src/devices/grolj4/grolj4.1.man
@@ -702,7 +702,7 @@ most developers had migrated to other means of obtaining 
font metrics,
 and support for new TFM files was very limited.
 .
 The TFM files provided for the TrueType fonts in the LaserJet\~4000
-support only the Latin 2 (ISO 8859-2) symbol set,
+support only the ISO\~Latin-2 (8859-2) symbol set,
 and include no kerning information;
 consequently,
 they are of little value for any but the most rudimentary documents.
diff --git a/src/devices/grops/grops.1.man b/src/devices/grops/grops.1.man
index 60be78f16..730d46700 100644
--- a/src/devices/grops/grops.1.man
+++ b/src/devices/grops/grops.1.man
@@ -1786,7 +1786,7 @@ It is automatically loaded by
 .TP
 .I @MACRODIR@/psold.tmac
 provides replacement glyphs for text fonts that lack complete coverage
-of the ISO Latin-1 character set;
+of the ISO\~Latin-1 character set;
 using it,
 .I groff
 can produce glyphs like eth (\[Sd]) and thorn (\[Tp]) that older
diff --git a/src/preproc/preconv/preconv.1.man 
b/src/preproc/preconv/preconv.1.man
index 4512aa5d9..4d5421fc3 100644
--- a/src/preproc/preconv/preconv.1.man
+++ b/src/preproc/preconv/preconv.1.man
@@ -94,9 +94,10 @@ and sends the result to the standard output stream.
 .
 Currently,
 this means that code points in the range 0\[en]127
-(in US-ASCII,
+in
+ISO 646:1991 IRV (US-ASCII),
 ISO\~8859,
-or Unicode)
+or Unicode
 remain as-is and the remainder are converted to the
 .I groff
 special character form
@@ -195,8 +196,8 @@ unless the locale is
 \[lq]C\[rq],
 \[lq]POSIX\[rq],
 or empty,
-in which case assume Latin-1
-(ISO\~8859-1).
+in which case assume ISO\~Latin-1
+(8859-1).
 .
 .
 .PP
@@ -409,7 +410,10 @@ While
 .I \%preconv
 recognizes all of the coding tags listed above,
 it is capable on its own of interpreting only two encodings:
-Latin-1 and and UTF-8.
+ISO\~Latin-1 and and UTF-8.
+.
+ISO\~646:1991\~IRV (US-ASCII)
+is a proper subset of both these encodings.
 .
 If
 .I iconv
diff --git a/src/roff/groff/groff.1.man b/src/roff/groff/groff.1.man
index b855a4a69..aa66cec0e 100644
--- a/src/roff/groff/groff.1.man
+++ b/src/roff/groff/groff.1.man
@@ -1400,8 +1400,8 @@ respectively.
 .
 .TP
 .B latin1
-for terminals using the ISO Latin-1
-(ISO 8859-1)
+for terminals using the ISO\~Latin-1
+(8859-1)
 character set and encoding.
 .
 .
diff --git a/src/utils/grog/grog.1.man b/src/utils/grog/grog.1.man
index af2cb4240..eac390768 100644
--- a/src/utils/grog/grog.1.man
+++ b/src/utils/grog/grog.1.man
@@ -192,14 +192,13 @@ compatibility mode and
 to select a non-default output device.
 .
 If the input is not encoded in
-US-ASCII
+ISO\~646:1991\~IRV (US-ASCII)
 or
-ISO 8859-1,
-specification of a
+ISO\~Latin-1 (8859-1),
+we advise specifying a
 .I groff
 option to run the
-.MR preconv @MAN1EXT@
-preprocessor is advised;
+.MR preconv @MAN1EXT@;
 see the
 .BR \-D ,
 .BR \-k ,
diff --git a/tmac/groff_mdoc.7.man b/tmac/groff_mdoc.7.man
index 4a65b1b3d..c0849b9e3 100644
--- a/tmac/groff_mdoc.7.man
+++ b/tmac/groff_mdoc.7.man
@@ -4883,8 +4883,8 @@ String    7-bit   8-bit   UCS     Prefer  Meaning
 .
 .Pp
 Some column headings are shorthand for standardized character encodings;
-\[lq]7-bit\[rq] for ISO 646:1991 IRV (US-ASCII),
-\[lq]8-bit\[rq] for ISO 8859-1 (Latin-1),
+\[lq]7-bit\[rq] for ISO\~646:1991\~IRV (US-ASCII),
+\[lq]8-bit\[rq] for ISO\~Latin-1 (8859-1),
 and
 \[lq]UCS\[rq] for ISO 10646 (Unicode character set).
 .

_______________________________________________
Groff-commit mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/groff-commit

Reply via email to