[1003.1(2008)/Issue 7 0001468]: awk FS definition not quite correct

Austin Group Bug Tracker via austin-group-l at The Open Group Sat, 24 Apr 2021 08:22:29 -0700


The following issue has been SUBMITTED. 
====================================================================== 
https://www.austingroupbugs.net/view.php?id=1468 
====================================================================== 
Reported By:                mortoneccc
Assigned To:                ajosey
====================================================================== 
Project:                    1003.1(2008)/Issue 7
Issue ID:                   1468
Category:                   Shell and Utilities
Type:                       Enhancement Request
Severity:                   Editorial
Priority:                   normal
Status:                     Under Review
Name:                       Ed Morton 
Organization:                
User Reference:              
Section:                    awk 
Page Number:                1 
Line Number:                1 
Interp Status:              --- 
Final Accepted Text:         
====================================================================== 
Date Submitted:             2021-04-24 15:20 UTC
Last Modified:              2021-04-24 15:20 UTC
====================================================================== 
Summary:                    awk FS definition not quite correct
Description: 
(sorry, I don't see any page or line numbers in the online spec, hence the
1 and 1 used above).


In the definition of FS in the awk spec
(https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html) it
says:

-----
The following describes FS behavior:

   If FS is a null string, the behavior is unspecified.

   If FS is a single character:

       If FS is <space>, skip leading and trailing <blank> and <newline>
characters; fields shall be delimited by sets of one or more <blank> or
<newline> characters.

       Otherwise, if FS is any other character c, fields shall be delimited
by each single occurrence of c.

       Otherwise, the string value of FS shall be considered to be an
extended regular expression. Each occurrence of a sequence matching the
extended regular expression shall delimit fields. 
-----

but that final case isn't exactly correct because an ERE can match a null
string while a FS can't. Try for example splitting a record on all
non-commas:

$ echo 'x,y,z' | awk -F'[^,]*' '{for (i=1;i<=NF;i++) print i, "<"$i">"}'
1 <>
2 <,>
3 <,>
4 <>

which makes sense since there's a null string before the first non-comma
(x), 2 commas around the 2nd non-comma (y) and a null string after the last
non-comma (z). Now remove the "y" from  the middle to get:

$ echo 'x,,z' | awk -F'[^,]*' '{for (i=1;i<=NF;i++) print i, "<"$i">"}'
1 <>
2 <,,>
3 <>

and note that the null string between the 2 commas which would match the
regexp `[^,]*` isn't actually matched by the FS `[^,]*`.
Desired Action: 
Change the final paragraph of the FS definition mentioned above to say
something like "Otherwise, the string value of FS shall be considered to be
an extended regular expression such that each occurrence of a sequence **of
one or more characters** matching the extended regular expression shall
delimit fields."
====================================================================== 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2021-04-24 15:20 mortoneccc     New Issue                                    
2021-04-24 15:20 mortoneccc     Status                   New => Under Review 
2021-04-24 15:20 mortoneccc     Assigned To               => ajosey          
2021-04-24 15:20 mortoneccc     Name                      => Ed Morton       
2021-04-24 15:20 mortoneccc     Section                   => awk             
2021-04-24 15:20 mortoneccc     Page Number               => 1               
2021-04-24 15:20 mortoneccc     Line Number               => 1               
======================================================================

[1003.1(2008)/Issue 7 0001468]: awk FS definition not quite correct

Reply via email to