The branch main has been updated by imp: URL: https://cgit.FreeBSD.org/src/commit/?id=e7a04a110724183c72e25c5c8461f89f50b4d08a
commit e7a04a110724183c72e25c5c8461f89f50b4d08a Author: Warner Losh <i...@freebsd.org> AuthorDate: 2025-09-04 05:44:33 +0000 Commit: Warner Losh <i...@freebsd.org> CommitDate: 2025-09-04 05:59:48 +0000 awk: Merge upstream manpage updates Merge the upstream manpage upades into awk.1. This goes through upstream hash 9acc510. Upstream man page is written in raw nroff with "an" macros, rather than in mandoc, so convert to mandoc as well. The man page isn't updated on imports automatically, plus our man page has diverged somewhat from upstraem's so it's not a mechanical change... PR: 230730 Sponsored by: Netflix --- usr.bin/awk/awk.1 | 136 ++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 127 insertions(+), 9 deletions(-) diff --git a/usr.bin/awk/awk.1 b/usr.bin/awk/awk.1 index 65c91738966b..612669629a02 100644 --- a/usr.bin/awk/awk.1 +++ b/usr.bin/awk/awk.1 @@ -21,7 +21,7 @@ .\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, .\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF .\" THIS SOFTWARE. -.Dd July 30, 2021 +.Dd September 3, 2025 .Dt AWK 1 .Os .Sh NAME @@ -32,7 +32,7 @@ .Op Fl safe .Op Fl version .Op Fl d Ns Op Ar n -.Op Fl F Ar fs +.Op Fl F Ar fs | Fl -csv .Op Fl v Ar var Ns = Ns Ar value .Op Ar prog | Fl f Ar progfile .Ar @@ -42,9 +42,11 @@ scans each input .Ar file for lines that match any of a set of patterns specified literally in .Ar prog -or in one or more files specified as +or in one or more files +specified as .Fl f Ar progfile . -With each pattern there can be an associated action that will be performed +With each pattern +there can be an associated action that will be performed when a line of a .Ar file matches the pattern. @@ -76,6 +78,11 @@ to dump core on fatal errors. .It Fl F Ar fs Define the input field separator to be the regular expression .Ar fs . +.It Fl -csv +causes +.Nm +to process records using (more or less) standard comma-separated values +(CSV) format. .It Fl f Ar progfile Read program code from the specified file .Ar progfile @@ -178,7 +185,7 @@ as the field separator, use the option with a value of .Sq [t] . .Pp -A pattern-action statement has the form +A pattern-action statement has the form: .Pp .D1 Ar pattern Ic \&{ Ar action Ic \&} .Pp @@ -347,7 +354,7 @@ in a pattern. A pattern may consist of two patterns separated by a comma; in this case, the action is performed for all lines from an occurrence of the first pattern -through an occurrence of the second. +through an occurrence of the second, inclusive. .Pp A relational expression is one of the following: .Pp @@ -363,7 +370,8 @@ A relational expression is one of the following: .Pp where a .Ar relop -is any of the six relational operators in C, and a +is any of the six relational operators in C, +and a .Ar matchop is either .Ic ~ @@ -386,6 +394,9 @@ and after the last. and .Ic END do not combine with other patterns. +They may appear multiple times in a program and execute +in the order they are read by +.Nm .Pp Variable names with special meanings: .Pp @@ -428,6 +439,11 @@ The length of the string matched by the function. .It Va RS Input record separator (default newline). +If empty, blank lines separate records. +If more than one character long, +.Va RS +is treated as a regular expression, and records are +separated by text matching the expression. .It Va RSTART The starting position of the string matched by the .Fn match @@ -515,7 +531,8 @@ occurs, or 0 if it does not. The length of .Fa s taken as a string, -or of +number of elements in an array for an array argument, +or length of .Va $0 if no argument is given. .It Fn match s r @@ -696,10 +713,44 @@ records from .Ar file remains open until explicitly closed with a call to .Fn close . +.It Fn systime +returns the current date and time as a standard +.Dq seconds since the epoch +value. +.It Fn strftime fmt timestamp +formats +.Fa timestamp +(a value in seconds since the epoch) +according to +Fa fmt , +which is a format string as supported by +.Xr strftime 3 . +Both +.Fa timestamp +and +.Fa fmt +may be omitted; if no +.Fa timestamp , +the current time of day is used, and if no +.Fa fmt , +a default format of +.Dq %a %b %e %H:%M:%S %Z %Y +is used. .It Fn system cmd Executes .Fa cmd and returns its exit status. +This will be -1 upon error, +.Fa cmd 's +exit status upon a normal exit, +256 + +.Va sig +upon death-by-signal, where +.Va sig +is the number of the murdering signal, +or 512 + +.Va sig +if there was a core dump. .El .Ss Bit-Operation Functions .Bl -tag -width "lshift(a, b)" @@ -725,6 +776,16 @@ Returns integer argument x shifted by n bits to the right. But note that the .Ic exit expression can modify the exit status. +.Sh ENVIRONMENT VARIABLES +If +.Va POSIXLY_CORRECT +is set in the environment, then +.Nm +follows the POSIX rules for +.Fn sub +and +.Fn gsub +with respect to consecutive backslashes and ampersands. .Sh EXAMPLES Print lines longer than 72 characters: .Pp @@ -734,7 +795,7 @@ Print first two fields in opposite order: .Pp .Dl { print $2, $1 } .Pp -Same, with input fields separated by comma and/or blanks and tabs: +Same, with input fields separated by comma and/or spaces and tabs: .Bd -literal -offset indent BEGIN { FS = ",[ \et]*|[ \et]+" } { print $2, $1 } @@ -810,6 +871,63 @@ to it. .Pp The scope rules for variables in functions are a botch; the syntax is worse. +.Pp +Input is expected to be UTF-8 encoded. +Other multibyte character sets are not handled. +However, in eight-bit locales, +.Nm +treats each input byte as a separate character. +.Sh UNUSUAL FLOATING-POINT VALUES +.Nm +was designed before IEEE 754 arithmetic defined Not-A-Number (NaN) +and Infinity values, which are supported by all modern floating-point +hardware. +.Pp +Because +.Nm +uses +.Xr strtod 3 +and +.Xr atof 3 +to convert string values to double-precision floating-point values, +modern C libraries also convert strings starting with +.Va inf +and +.Va nan +into infinity and NaN values respectively. +This led to strange results, +with something like this: +.Bd -literal -offset indent +echo nancy | awk '{ print $1 + 0 }' +.Ed +.Pp +printing +.Dq nan +instead of zero. +.Pp +.Nm +now follows GNU AWK, and prefilters string values before attempting +to convert them to numbers, as follows: +.Bl -tag -width "Hexadecimal values" +.It Hexadecimal values +Hexadecimal values (allowed since C99) convert to zero, as they did +prior to C99. +.It NaN values +The two strings +.Dq +nan +and +.Dq -nan +(case independent) convert to NaN. +No others do. +(NaNs can have signs.) +.It Infinity values +The two strings +.Dq +inf +and +.Dq -inf +(case independent) convert to positive and negative infinity, respectively. +No others do. +.El .Sh DEPRECATED BEHAVIOR One True Awk has accepted .Fl F Ar t