On 13/12/2020 23:23, KOBAYASHI Takashi wrote:
Hello,
I found a bug of -d in nl when a single character is specified. The patches
are also attached. I think there are two ways to fix this.
The current behavior:
$ echo -e 'a\n@:@:\nb\n@@\nc' | nl -d '@'
1 a
2 @:@:
3 b
4 @@
5 c
POSIX expectations(Attach: nl-d-posix.patch):
$ echo -e 'a\n@:@:\nb\n@@\nc' | nl -d '@'
1 a
1 b
2 @@
3 c
POSIX(1003.1-2017)
-d delim
Specify the delimiter characters that indicate the start of a logical
page section. These can be changed from the default characters "\:" to two
user-specified characters. If only one character is entered, the second
character shall remain the default character ':'.
Also, Texinfo is written in the same behavior as POSIX.
‘-d CD’
‘--section-delimiter=CD’
Set the section delimiter characters to CD; default is ‘\:’. If
only C is given, the second remains ‘:’. (Remember to protect ‘\’
or other metacharacters from shell expansion with quotes or extra
backslashes.)
I don't like this complex POSIX specification because it doesn't allow us
to specify a single character delimiter.
Does anyone know why POSIX was specified the way it was? I would like to
know the background.
I believe that the section delimiter should be generated from the string
given to the option, regardless of the length of the characters. And by the
string given two characters, it is possible to make the second character
":".
The following is more clear(Attach: nl-d-incompatible.patch):
$ echo -e 'a\n@:@:\nb\n@@\nc' | nl -d '@'
1 a
2 @:@:
3 b
1 c
$ echo -e 'a\n@:@:\nb\n@@\nc' | nl -d '@:'
1 a
1 b
2 @@
3 c
Nice find!
I agree that supporting matching a single char would be preferable
to the POSIX mandated behavior of assuming a second ':' character.
However no-one has specifically asked for the single char support,
and I don't think it's worth breaking compat with POSIX and other
implementations. So we should not apply your second "incompatible"
patch I think.
BTW, we also have an undocumented behavior that strings longer
than two characters can be provided.
We should probably document that GNU extension.
As for the implementation, this code is already a bit "copy and pastey",
so I'd prefer not to copy the logic again.
I've attached two patches to address this.
The first changes nothing, but refactors things to simplify the fix.
The second (in your name) applies the specific fix and adds a test.
I'll apply these later if you're Ok with it.
cheers,
Pádraig
commit f9b0481d2d65cfee999d535a963a57889a5c18bd
Author: Pádraig Brady <p...@draigbrady.com>
Date: Mon Dec 14 16:10:48 2020 +0000
maint: refactor nl section delimiter handling
* src/nl.c (main): Update the default delimiter characters
when passed two characters with --section-delimiter.
Avoid redundant copies for the body and footer delimiter strings,
and instead, just offset into the header string.
* doc/coreutils.texi (nl invocation): Mention the GNU extension
of allowing arbitrary length delimiter strings.
* tests/misc/nl.sh: Add a test case for the GNU extension.
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 5ac3745bd..f1d3582e7 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -1806,8 +1806,8 @@ start of footer.
@end table
The two characters from which these strings are made can be changed from
-@samp{\} and @samp{:} via options (see below), but the pattern and
-length of each string cannot be changed.
+@samp{\} and @samp{:} via options (see below), but the pattern
+of each string cannot be changed.
A section delimiter is replaced by an empty line on output. Any text
that comes before the first section delimiter string in the input file
@@ -1847,6 +1847,7 @@ expression @var{bre}.
@cindex section delimiters of pages
Set the section delimiter characters to @var{cd}; default is
@samp{\:}. If only @var{c} is given, the second remains @samp{:}.
+As a GNU extension more than two characters can be specified.
(Remember to protect @samp{\} or other metacharacters from shell
expansion with quotes or extra backslashes.)
diff --git a/src/nl.c b/src/nl.c
index 959909f05..4e6f38ae0 100644
--- a/src/nl.c
+++ b/src/nl.c
@@ -54,7 +54,7 @@ static char const FORMAT_RIGHT_LZ[] = "%0*" PRIdMAX "%s";
static char const FORMAT_LEFT[] = "%-*" PRIdMAX "%s";
/* Default section delimiter characters. */
-static char const DEFAULT_SECTION_DELIMITERS[] = "\\:";
+static char DEFAULT_SECTION_DELIMITERS[] = "\\:";
/* Types of input lines: either one of the section delimiters,
or text to output. */
@@ -96,7 +96,7 @@ static struct re_pattern_buffer *current_regex = NULL;
static char const *separator_str = "\t";
/* Input section delimiter string (-d). */
-static char const *section_del = DEFAULT_SECTION_DELIMITERS;
+static char *section_del = DEFAULT_SECTION_DELIMITERS;
/* Header delimiter string. */
static char *header_del = NULL;
@@ -553,7 +553,14 @@ main (int argc, char **argv)
}
break;
case 'd':
- section_del = optarg;
+ if (strlen (optarg) == 2) /* POSIX. */
+ {
+ char *p = section_del;
+ while (*optarg)
+ *p++ = *optarg++;
+ }
+ else
+ section_del = optarg; /* GNU extension. */
break;
case_GETOPT_HELP_CHAR;
case_GETOPT_VERSION_CHAR (PROGRAM_NAME, AUTHORS);
@@ -574,12 +581,10 @@ main (int argc, char **argv)
stpcpy (stpcpy (stpcpy (header_del, section_del), section_del), section_del);
body_del_len = len * 2;
- body_del = xmalloc (body_del_len + 1);
- stpcpy (stpcpy (body_del, section_del), section_del);
+ body_del = header_del + len;
footer_del_len = len;
- footer_del = xmalloc (footer_del_len + 1);
- stpcpy (footer_del, section_del);
+ footer_del = header_del + len * 2;
/* Initialize the input buffer. */
initbuffer (&line_buf);
diff --git a/tests/misc/nl.sh b/tests/misc/nl.sh
index fd9c5326c..a68e980f8 100755
--- a/tests/misc/nl.sh
+++ b/tests/misc/nl.sh
@@ -79,4 +79,14 @@ compare exp out || fail=1
printf '%s\n' a b c > in.txt || framework_failure_
returns_ 1 nl -v$INTMAX_MAX -i$INTMAX_MIN in.txt > out || fail=1
+# Test GNU extension to --section-delimiter, of supporting strings longer than 2
+printf '%s\n' a foofoo c > in.txt || framework_failure_
+nl -d 'foo' in.txt > out || fail=1
+cat <<EOF > exp
+ 1 a
+
+ 1 c
+EOF
+compare exp out || fail=1
+
Exit $fail
commit 3a098519f1ea64b928c27b6d058011769f489be2
Author: KOBAYASHI Takashi <a141...@aiit.ac.jp>
Date: Mon Dec 14 16:17:10 2020 +0000
nl: fix --section-delimiter handling of single characters
* src/nl.c (main): Enforce the POSIX specified
behavior of assuming ':' is specified after a single
character argument to -d.
* tests/misc/nl.sh: Add a test case.
* NEWS: Mention the bug fix.
diff --git a/NEWS b/NEWS
index d2cb9ae48..dfc7bfa41 100644
--- a/NEWS
+++ b/NEWS
@@ -20,6 +20,10 @@ GNU coreutils NEWS -*- outline -*-
ls no longer crashes when printing the SELinux context for unstatable files.
[bug introduced in coreutils-6.9.91]
+ nl now handles single character --section-delimiter arguments,
+ by assuming a second ':' character has been specified, as specified by POSIX.
+ [This bug was present in "the beginning".]
+
rm no longer skips an extra file when the removal of an empty directory fails.
[bug introduced by the rewrite to use fts in coreutils-8.0]
diff --git a/src/nl.c b/src/nl.c
index 4e6f38ae0..733ad94cd 100644
--- a/src/nl.c
+++ b/src/nl.c
@@ -553,7 +553,7 @@ main (int argc, char **argv)
}
break;
case 'd':
- if (strlen (optarg) == 2) /* POSIX. */
+ if (strlen (optarg) <= 2) /* POSIX. */
{
char *p = section_del;
while (*optarg)
diff --git a/tests/misc/nl.sh b/tests/misc/nl.sh
index a68e980f8..710623af6 100755
--- a/tests/misc/nl.sh
+++ b/tests/misc/nl.sh
@@ -89,4 +89,15 @@ cat <<EOF > exp
EOF
compare exp out || fail=1
+# Ensure single char delimiters assume a following ':' character (as per POSIX)
+# coreutils <= v8.32 didn't match single char delimiters at all
+printf '%s\n' a x:x: c > in.txt || framework_failure_
+nl -d 'x' in.txt > out || fail=1
+cat <<EOF > exp
+ 1 a
+
+ 1 c
+EOF
+compare exp out || fail=1
+
Exit $fail