[groff] 12/19: [troff]: \X maps some basic Latin chars to specs.

G. Branden Robinson Fri, 30 Aug 2024 21:38:57 -0700

gbranden pushed a commit to branch master
in repository groff.

commit dcae60b0fb1ad3fa3314fdfdbecb973961a40410
Author: G. Branden Robinson <[email protected]>
AuthorDate: Fri Aug 30 16:10:34 2024 -0500


    [troff]: \X maps some basic Latin chars to specs.
    
    * src/roff/troff/input.cpp (encode_character_for_device_output): In
      device-independent output, represent ordinary characters that normally
      map to non-basic Latin code points ('-^`~) in a way that is compatible
      with how they're typeset, to avoid confusion when these characters are
      used in ways that are ultimately visible, as in tag names for PDF
      bookmarks, which can appear in a viewer's navigation pane.
    
    * src/roff/groff/tests/device-control-special-character-handling.sh:
      Test conversions of basic Latin input characters to Unicode code
      points.
    
    Continues fixing Savannah #63074.
---
 ChangeLog                                          | 16 ++++++++
 .../device-control-special-character-handling.sh   | 43 ++++++++++++++++------
 src/roff/troff/input.cpp                           | 17 ++++++++-
 3 files changed, 63 insertions(+), 13 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 8f8f8dd12..dffa072fb 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,19 @@
+2024-08-30  G. Branden Robinson <[email protected]>
+
+       * src/roff/troff/input.cpp (encode_character_for_device_output):
+       In device-independent output, represent ordinary characters that
+       normally map to non-basic Latin code points ('-^`~) in a way
+       that is compatible with how they're typeset, to avoid confusion
+       when these characters are used in ways that are ultimately
+       visible, as in tag names for PDF bookmarks, which can appear in
+       a viewer's navigation pane.
+
+       * src/roff/groff/tests/\
+       device-control-special-character-handling.sh: Test conversions
+       of basic Latin input characters to Unicode code points.
+
+       Continues fixing Savannah #63074.
+
 2024-08-30  G. Branden Robinson <[email protected]>
 
        * src/roff/groff/tests/\
diff --git a/src/roff/groff/tests/device-control-special-character-handling.sh 
b/src/roff/groff/tests/device-control-special-character-handling.sh
index 1a755846a..6e133d0b7 100755
--- a/src/roff/groff/tests/device-control-special-character-handling.sh
+++ b/src/roff/groff/tests/device-control-special-character-handling.sh
@@ -29,11 +29,11 @@ wail () {
 
 input='.
 .nf
-\X#bogus1: esc \%to-do\[u1F63C]\\[u1F00]\-\[`a]#
-.device bogus1: req \%to-do\[u1F63C]\\[u1F00]\-\[`a]
+\X#bogus1: esc \%\[u1F63C]\\[u1F00]\-\[`a]#
+.device bogus1: req \%\[u1F63C]\\[u1F00]\-\[`a]
 .ec @
-@X#bogus2: esc @%to-do@[u1F63C]@@[u1F00]@-@[`a]#
-.device bogus2: req @%to-do@[u1F63C]@@[u1F00]@-@[`a]
+@X#bogus2: esc @%@[u1F63C]@@[u1F00]@-@[`a]#
+.device bogus2: req @%@[u1F63C]@@[u1F00]@-@[`a]
 .'
 
 output=$(printf '%s\n' "$input" | "$groff" -T ps -Z 2> /dev/null \
@@ -45,27 +45,27 @@ echo "$error"
 
 # Expected:
 #
-# x X bogus1: esc to-do\[u1F63C]\[u1F00]-\[u00E0]
-# x X bogus1: req @%to-do\[u1F63C]\[u1F00]@-\[`a]
-# x X bogus2: esc to-do\[u1F63C]\[u1F00]-\[u00E0]
-# x X bogus2: req @%to-do@[u1F63C]@[u1F00]@-@[`a]
+# x X bogus1: esc \[u1F63C]\[u1F00]-\[u00E0]
+# x X bogus1: req @%\[u1F63C]\[u1F00]@-\[`a]
+# x X bogus2: esc \[u1F63C]\[u1F00]-\[u00E0]
+# x X bogus2: req @%@[u1F63C]@[u1F00]@-@[`a]
 
 echo "checking X escape sequence, default escape character" >&2
 echo "$output" \
-  | grep -Fqx 'x X bogus1: esc to-do\[u1F63C]\[u1F00]-\[u00E0]' || wail
+  | grep -Fqx 'x X bogus1: esc \[u1F63C]\[u1F00]-\[u00E0]' || wail
 
 #echo "checking device request, default escape character" >&2
 #echo "$output" \
-#  | grep -qx 'x X bogus1: req to-do\\\[u1F00\] -'"'"'"`^\\~' \
+#  | grep -qx 'x X bogus1: req \\\[u1F00\] -'"'"'"`^\\~' \
 #  || wail
 
 echo "checking X escape sequence, alternate escape character" >&2
 echo "$output" \
-  | grep -Fqx 'x X bogus2: esc to-do\[u1F63C]\[u1F00]-\[u00E0]' || wail
+  | grep -Fqx 'x X bogus2: esc \[u1F63C]\[u1F00]-\[u00E0]' || wail
 
 #echo "checking device request, alternate escape character" >&2
 #echo "$output" \
-#  | grep -qx 'x X bogus2: req to-do\\\[u1F00\] -'"'"'"`^\\~' \
+#  | grep -qx 'x X bogus2: req \\\[u1F00\] -'"'"'"`^\\~' \
 #  || wail
 
 input='.
@@ -97,6 +97,25 @@ echo "$output" | grep -Fqx 'x X bogus4: [\]^' || wail
 echo "checking X escape sequence, conversions to basic Latin (3/3)" >&2
 echo "$output" | grep -Fqx 'x X bogus5: {||}~' || wail
 
+input='.
+.nf
+\X#bogus6: '"'"'-^`~#
+.\"device bogus6: '"'"'-^`~
+.'
+
+# Expected:
+#
+# x X bogus6: \[u2019]\[u2010]\[u0302]\[u0300]\[u0303]
+
+output=$(printf '%s\n' "$input" | "$groff" -T ps -Z 2> /dev/null \
+  | grep '^x X')
+echo "$output"
+
+echo "checking X escape sequence, conversions from basic Latin..." >&2
+echo "$output" \
+  | grep -Fqx 'x X bogus6: \[u2019]\[u2010]\[u0302]\[u0300]\[u0303]' \
+  || wail
+
 test -z "$fail"
 
 # vim:set autoindent expandtab shiftwidth=2 tabstop=2 textwidth=72:
diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 75a157a33..4501f8a65 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -5806,7 +5806,22 @@ static void encode_character_for_device_output(macro 
*mac, const char c)
              " output", tok.description());
   }
   else {
-    if (c == escape_char)
+    // We want to represent ordinary characters that normally map to
+    // non-basic Latin code points in a way that is compatible with how
+    // they're typeset, to avoid confusion when these characters are
+    // used in ways that are ultimately visible, as in tag names for PDF
+    // bookmarks, which can appear in a viewer's navigation pane.
+    if ('\'' == c)
+      mac->append_str("\\[u2019]");
+    else if ('-' == c)
+      mac->append_str("\\[u2010]");
+    else if ('^' == c)
+      mac->append_str("\\[u0302]");
+    else if ('`' == c)
+      mac->append_str("\\[u0300]");
+    else if ('~' == c)
+      mac->append_str("\\[u0303]");
+    else if (c == escape_char)
       mac->append('\\');
     else
       mac->append(c);

_______________________________________________
Groff-commit mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/groff-commit

[groff] 12/19: [troff]: \X maps some basic Latin chars to specs.

Reply via email to