Sorry for my rush job, thanks for your refactoring! I agree with not breaking compat with POSIX.
However, with this change, the behavior of the -d blank character is different than before. I think the previous behavior was important to ensure that all input is treated as a single page. This case is not specifically mentioned in POSIX, so I propose it is kept as the GNU extension. The fix patch is modified. Takashi 2020年12月14日(月) 8:53 Pádraig Brady <p...@draigbrady.com>: > On 13/12/2020 23:23, KOBAYASHI Takashi wrote: > > Hello, > > > > I found a bug of -d in nl when a single character is specified. The > patches > > are also attached. I think there are two ways to fix this. > > > > The current behavior: > > $ echo -e 'a\n@:@:\nb\n@@\nc' | nl -d '@' > > 1 a > > 2 @:@: > > 3 b > > 4 @@ > > 5 c > > > > POSIX expectations(Attach: nl-d-posix.patch): > > $ echo -e 'a\n@:@:\nb\n@@\nc' | nl -d '@' > > 1 a > > > > 1 b > > 2 @@ > > 3 c > > > > POSIX(1003.1-2017) > >> -d delim > >> Specify the delimiter characters that indicate the start of a logical > > page section. These can be changed from the default characters "\:" to > two > > user-specified characters. If only one character is entered, the second > > character shall remain the default character ':'. > > > > Also, Texinfo is written in the same behavior as POSIX. > >> ‘-d CD’ > >> ‘--section-delimiter=CD’ > >> Set the section delimiter characters to CD; default is ‘\:’. If > >> only C is given, the second remains ‘:’. (Remember to protect ‘\’ > >> or other metacharacters from shell expansion with quotes or extra > >> backslashes.) > > > > > > I don't like this complex POSIX specification because it doesn't allow us > > to specify a single character delimiter. > > Does anyone know why POSIX was specified the way it was? I would like to > > know the background. > > I believe that the section delimiter should be generated from the string > > given to the option, regardless of the length of the characters. And by > the > > string given two characters, it is possible to make the second character > > ":". > > > > The following is more clear(Attach: nl-d-incompatible.patch): > > $ echo -e 'a\n@:@:\nb\n@@\nc' | nl -d '@' > > 1 a > > 2 @:@: > > 3 b > > > > 1 c > > $ echo -e 'a\n@:@:\nb\n@@\nc' | nl -d '@:' > > 1 a > > > > 1 b > > 2 @@ > > 3 c > > Nice find! > > I agree that supporting matching a single char would be preferable > to the POSIX mandated behavior of assuming a second ':' character. > However no-one has specifically asked for the single char support, > and I don't think it's worth breaking compat with POSIX and other > implementations. So we should not apply your second "incompatible" > patch I think. > > BTW, we also have an undocumented behavior that strings longer > than two characters can be provided. > We should probably document that GNU extension. > > As for the implementation, this code is already a bit "copy and pastey", > so I'd prefer not to copy the logic again. > > I've attached two patches to address this. > The first changes nothing, but refactors things to simplify the fix. > The second (in your name) applies the specific fix and adds a test. > > I'll apply these later if you're Ok with it. > > cheers, > Pádraig >
--- nl.ref.c 2020-12-14 13:39:02.657144799 -0800 +++ nl.c 2020-12-14 13:52:27.466121061 -0800 @@ -553,7 +553,8 @@ } break; case 'd': - if (strlen (optarg) == 2) /* POSIX. */ + len = strlen (optarg); + if (len == 1 || len == 2) /* POSIX. */ { char *p = section_del; while (*optarg)