gbranden pushed a commit to branch master
in repository groff.
commit 39c1176bfa9ba84d519b015b1453f316e013d1de
Author: G. Branden Robinson <[email protected]>
AuthorDate: Mon Jan 27 18:52:39 2025 -0600
[troff]: Fix Savannah #66686 (`\w|foo|` blues).
* src/roff/troff/input.cpp (is_char_usable_as_delimiter): Restore `|`
character as an invalid delimiter when not in compatibility mode.
This would regress the fix for Savannah #66481, but...
(do_overstrike, do_bracket, do_name_test, do_zero_width_output)
(read_size, do_register, do_width, do_device_extension)
(read_drawing_command): Throw warning in "delimiter" category and
explain ambiguity of delimiter instead of emitting error and refusing
further interpretation of the escape sequence being parsed. Leave
behind "#if"ed code for restoration of former stricter behavior in a
future groff release (which would fix Savannah #66009).
(is_conditional_expression_true): Stop special-casing an exception for
`|` that permitted it to be used as a formatted output comparison
operator. Savannah #66481 complained only about groff's rejection of
`|` to delimit the argument to the `\w` (width measurement) escape
sequence, not in general, and was seen in some man pages. The usage
Paul Eggert reported remains accepted, albeit warned about, per
`do_width()` above.
* src/roff/groff/tests/check-delimiter-validity.sh: Update test
expectations. We now expect `|` to be invalid once again to delimit a
line-drawing escape sequence.
Fixes <https://savannah.gnu.org/bugs/?66686>. Thanks to Dave Kemper for
the report. Savannah #66526 is implicated.
The uses of `\w|whatever|` cited by Eggert appear to be a chunk of
boilerplate passed around by some man page authors for GNU man pages.
Eggert's "tz" package doesn't itself use `|` as a delimiter, but I did
verify their use in gawk, grep, and rcs.
gawk:
.if !\n(.g \{\
. if !\w|\*(lq| \{\
. ds lq ``
. if \w'\(lq' .ds lq "\(lq
. \}
. if !\w|\*(rq| \{\
. ds rq ''
. if \w'\(rq' .ds rq "\(rq
. \}
.\}
grep:
.if !\w|\*(lq| \{\
.\" groff an-old.tmac does not seem to be in use, so define lq and rq.
. ie \n(.g \{\
. ds lq \(lq\"
. ds rq \(rq\"
. \}
. el \{\
. ds lq ``
. ds rq ''
. \}
.\}
rcs:
.if !\n(.g \{\
. if !\w|\*(lq| \{\
. ds lq ``
. if \w'\(lq' .ds lq "\(lq
. \}
. if !\w|\*(rq| \{\
. ds rq ''
. if \w'\(rq' .ds rq "\(rq
. \}
.\}
One observes that only grep(1) doesn't feature-gate its use of `\w|foo|`
behind a test of the `.g` register (groff-compatible formatter) for
falsity. It is therefore the only one of these three that would have
provoked a warning. (What the foregoing do is reimplement the `lq` and
`rq` quotation strings, an extension from 4BSD (1980)--my guess is to
compensate for System V-based troffs missing them).
The rcs man pages otherwise use `\w` pretty extensively (which can be
its own portability worry), but in every other case with the unambiguous
`'` delimiter.
---
ChangeLog | 31 ++++++++++
src/roff/groff/tests/check-delimiter-validity.sh | 4 +-
src/roff/troff/input.cpp | 77 +++++++++++++++++++++---
3 files changed, 100 insertions(+), 12 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index f8e3a1c49..f76bcfcc9 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,34 @@
+2025-01-28 G. Branden Robinson <[email protected]>
+
+ * src/roff/troff/input.cpp (is_char_usable_as_delimiter):
+ Restore `|` character as an invalid delimiter when not in
+ compatibility mode. This would regress the fix for Savannah
+ #66481, but...
+
+ (do_overstrike, do_bracket, do_name_test, do_zero_width_output)
+ (read_size, do_register, do_width, do_device_extension)
+ (read_drawing_command): Throw warning in "delimiter" category
+ and explain ambiguity of delimiter instead of emitting error and
+ refusing further interpretation of the escape sequence being
+ parsed. Leave behind "#if"ed code for restoration of former
+ stricter behavior in a future groff release (which would fix
+ Savannah #66009).
+
+ (is_conditional_expression_true): Stop special-casing an
+ exception for `|` that permitted it to be used as a formatted
+ output comparison operator. Savannah #66481 complained only
+ about groff's rejection of `|` to delimit the argument to the
+ `\w` (width measurement) escape sequence, not in general, and
+ was seen in some man pages. The usage Paul Eggert reported
+ remains accepted, albeit warned about, per `do_width()` above.
+
+ * src/roff/groff/tests/check-delimiter-validity.sh: Update test
+ expectations. We now expect `|` to be invalid once again to
+ delimit a line-drawing escape sequence.
+
+ Fixes <https://savannah.gnu.org/bugs/?66686>. Thanks to Dave
+ Kemper for the report. Savannah #66526 is implicated.
+
2025-01-27 G. Branden Robinson <[email protected]>
* src/roff/troff/env.cpp (environment::choose_breakpoint):
diff --git a/src/roff/groff/tests/check-delimiter-validity.sh
b/src/roff/groff/tests/check-delimiter-validity.sh
index 6ecddddbd..c2e107182 100755
--- a/src/roff/groff/tests/check-delimiter-validity.sh
+++ b/src/roff/groff/tests/check-delimiter-validity.sh
@@ -28,7 +28,7 @@ wail () {
}
for c in A B C D E F G H I J K L M N O P Q R S T U V W X Y Z \
- a b c d e f g h i j k l m n o p q r s t u v w x y z '|'
+ a b c d e f g h i j k l m n o p q r s t u v w x y z
do
echo "checking validity of '$c' as delimiter in normal mode" \
>&2
@@ -38,7 +38,7 @@ do
echo "$output" | grep -Fqx ___ || wail
done
-for c in 0 1 2 3 4 5 6 7 8 9 + - / '*' % '<' '>' = '&' : '(' ')' .
+for c in 0 1 2 3 4 5 6 7 8 9 + - / '*' % '<' '>' = '&' : '(' ')' . '|'
do
echo "checking invalidity of '$c' as delimiter in normal mode" \
>&2
diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index ab50f1f54..9f82c2a84 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -1600,10 +1600,17 @@ node *do_overstrike() // \o
int start_level = input_stack::get_level();
token start_token;
start_token.next();
+ if (!start_token.is_usable_as_delimiter())
+ warning(WARN_DELIM, "interpreting %1 as an escape sequence"
+ " delimiter; it is ambiguous because it is also meaningful"
+ " in a numeric expression", start_token.description());
+ // TODO: groff 1.25?
+#if 0
if (!start_token.is_usable_as_delimiter(true /* report error */)) {
delete osnode;
return 0 /* nullptr */;
}
+#endif
for (;;) {
tok.next();
if (tok.is_newline() || tok.is_eof()) {
@@ -1645,10 +1652,17 @@ static node *do_bracket() // \b
int start_level = input_stack::get_level();
token start_token;
start_token.next();
+ if (!start_token.is_usable_as_delimiter())
+ warning(WARN_DELIM, "interpreting %1 as an escape sequence"
+ " delimiter; it is ambiguous because it is also meaningful"
+ " in a numeric expression", start_token.description());
+ // TODO: groff 1.25?
+#if 0
if (!start_token.is_usable_as_delimiter(true /* report error */)) {
delete bracketnode;
return 0 /* nullptr */;
}
+#endif
for (;;) {
tok.next();
if (tok.is_newline() || tok.is_eof()) {
@@ -1680,8 +1694,15 @@ static const char *do_name_test() // \A
int start_level = input_stack::get_level();
token start_token;
start_token.next();
+ if (!start_token.is_usable_as_delimiter())
+ warning(WARN_DELIM, "interpreting %1 as an escape sequence"
+ " delimiter; it is ambiguous because it is also meaningful"
+ " in a numeric expression", start_token.description());
+ // TODO: groff 1.25?
+#if 0
if (!start_token.is_usable_as_delimiter(true /* report error */))
return 0 /* nullptr */;
+#endif
bool got_bad_char = false;
bool got_some_char = false;
for (;;) {
@@ -1790,10 +1811,17 @@ static node *do_zero_width_output() // \Z
int start_level = input_stack::get_level();
token start_token;
start_token.next();
+ if (!start_token.is_usable_as_delimiter())
+ warning(WARN_DELIM, "interpreting %1 as an escape sequence"
+ " delimiter; it is ambiguous because it is also meaningful"
+ " in a numeric expression", start_token.description());
+ // TODO: groff 1.25?
+#if 0
if (!start_token.is_usable_as_delimiter(true /* report error */)) {
delete rev;
return 0 /* nullptr */;
}
+#endif
for (;;) {
tok.next();
if (tok.is_newline() || tok.is_eof()) {
@@ -2614,12 +2642,8 @@ static bool is_char_usable_as_delimiter(int c)
case '(':
case ')':
case '.':
+ case '|':
return false;
- // TODO: In groff 1.25, style-warn on '|' and [A-Za-z].
- // TODO: In groff 1.26, promote style warning to error with
- // deprecation message.
- // TODO: In groff 1.27, make '|' and letters return false.
- // See Savannah #66481.
default:
return true;
}
@@ -5561,8 +5585,15 @@ static bool read_size(int *x)
}
val *= sizescale;
}
- else if (!tok.is_usable_as_delimiter(true /* report error */))
+ else if (!tok.is_usable_as_delimiter())
+ warning(WARN_DELIM, "interpreting %1 as an escape sequence"
+ " delimiter; it is ambiguous because it is also meaningful"
+ " in a numeric expression", tok.description());
+ // TODO: groff 1.25?
+#if 0
+ else if (!to.is_usable_as_delimiter(true /* report error */))
return false;
+#endif
else {
token start(tok);
tok.next();
@@ -5693,8 +5724,15 @@ static void do_register() // \R
{
token start_token;
start_token.next();
- if (!start_token.is_usable_as_delimiter(true /* report error */))
+ if (!start_token.is_usable_as_delimiter())
+ warning(WARN_DELIM, "interpreting %1 as an escape sequence"
+ " delimiter; it is ambiguous because it is also meaningful"
+ " in a numeric expression", start_token.description());
+ // TODO: groff 1.25?
+#if 0
+ if (!start_token.is_usable_as_delimiter(true /* report error */)) {
return;
+#endif
tok.next();
symbol nm = get_long_name(true /* required */);
if (nm.is_null())
@@ -5726,8 +5764,15 @@ static void do_width() // \w
int start_level = input_stack::get_level();
token start_token;
start_token.next();
+ if (!start_token.is_usable_as_delimiter())
+ warning(WARN_DELIM, "interpreting %1 as an escape sequence"
+ " delimiter; it is ambiguous because it is also meaningful"
+ " in a numeric expression", start_token.description());
+ // TODO: groff 1.25?
+#if 0
if (!start_token.is_usable_as_delimiter(true /* report error */))
return;
+#endif
environment env(curenv);
environment *oldenv = curenv;
curenv = &env;
@@ -6005,8 +6050,15 @@ static node *do_device_extension() // \X
int start_level = input_stack::get_level();
token start_token;
start_token.next();
+ if (!start_token.is_usable_as_delimiter())
+ warning(WARN_DELIM, "interpreting %1 as an escape sequence"
+ " delimiter; it is ambiguous because it is also meaningful"
+ " in a numeric expression", start_token.description());
+ // TODO: groff 1.25?
+#if 0
if (!start_token.is_usable_as_delimiter(true /* report error */))
return 0 /* nullptr */;
+#endif
macro mac;
if ((curdiv == topdiv) && (topdiv->before_first_page_status > 0))
topdiv->begin_page();
@@ -6457,9 +6509,7 @@ static bool is_conditional_expression_true()
}
else if (tok.is_space())
result = false;
- // Treat `|` specially for AT&T troff compatibility, where it _isn't_
- // a delimiter in this context; see Savannah #66526.
- else if (tok.is_usable_as_delimiter() && (tok.ch() != '|')) {
+ else if (tok.is_usable_as_delimiter()) {
// Perform (formatted) output comparison.
token delim = tok;
int delim_level = input_stack::get_level();
@@ -9498,8 +9548,15 @@ static node *read_drawing_command()
{
token start_token;
start_token.next();
+ if (!start_token.is_usable_as_delimiter())
+ warning(WARN_DELIM, "interpreting %1 as an escape sequence"
+ " delimiter; it is ambiguous because it is also meaningful"
+ " in a numeric expression", start_token.description());
+ // TODO: groff 1.25?
+#if 0
if (!start_token.is_usable_as_delimiter(true /* report error */))
return 0 /* nullptr */;
+#endif
else {
tok.next();
if (tok == start_token)
_______________________________________________
groff-commit mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/groff-commit