The following issue has been SUBMITTED. ====================================================================== https://www.austingroupbugs.net/view.php?id=1468 ====================================================================== Reported By: mortoneccc Assigned To: ajosey ====================================================================== Project: 1003.1(2008)/Issue 7 Issue ID: 1468 Category: Shell and Utilities Type: Enhancement Request Severity: Editorial Priority: normal Status: Under Review Name: Ed Morton Organization: User Reference: Section: awk Page Number: 1 Line Number: 1 Interp Status: --- Final Accepted Text: ====================================================================== Date Submitted: 2021-04-24 15:20 UTC Last Modified: 2021-04-24 15:20 UTC ====================================================================== Summary: awk FS definition not quite correct Description: (sorry, I don't see any page or line numbers in the online spec, hence the 1 and 1 used above).
In the definition of FS in the awk spec (https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html) it says: ----- The following describes FS behavior: If FS is a null string, the behavior is unspecified. If FS is a single character: If FS is <space>, skip leading and trailing <blank> and <newline> characters; fields shall be delimited by sets of one or more <blank> or <newline> characters. Otherwise, if FS is any other character c, fields shall be delimited by each single occurrence of c. Otherwise, the string value of FS shall be considered to be an extended regular expression. Each occurrence of a sequence matching the extended regular expression shall delimit fields. ----- but that final case isn't exactly correct because an ERE can match a null string while a FS can't. Try for example splitting a record on all non-commas: $ echo 'x,y,z' | awk -F'[^,]*' '{for (i=1;i<=NF;i++) print i, "<"$i">"}' 1 <> 2 <,> 3 <,> 4 <> which makes sense since there's a null string before the first non-comma (x), 2 commas around the 2nd non-comma (y) and a null string after the last non-comma (z). Now remove the "y" from the middle to get: $ echo 'x,,z' | awk -F'[^,]*' '{for (i=1;i<=NF;i++) print i, "<"$i">"}' 1 <> 2 <,,> 3 <> and note that the null string between the 2 commas which would match the regexp `[^,]*` isn't actually matched by the FS `[^,]*`. Desired Action: Change the final paragraph of the FS definition mentioned above to say something like "Otherwise, the string value of FS shall be considered to be an extended regular expression such that each occurrence of a sequence **of one or more characters** matching the extended regular expression shall delimit fields." ====================================================================== Issue History Date Modified Username Field Change ====================================================================== 2021-04-24 15:20 mortoneccc New Issue 2021-04-24 15:20 mortoneccc Status New => Under Review 2021-04-24 15:20 mortoneccc Assigned To => ajosey 2021-04-24 15:20 mortoneccc Name => Ed Morton 2021-04-24 15:20 mortoneccc Section => awk 2021-04-24 15:20 mortoneccc Page Number => 1 2021-04-24 15:20 mortoneccc Line Number => 1 ======================================================================