cga2000 wrote: > Here's a sample of how the multiple separators feature behaves: > > [15:52:[EMAIL PROTECTED]:~]$ echo " one: two:three :four five" | awk -F "[: > ]" '{print "1 "$1; print "2 "$2; print "3 "$3; print "4 "$4; print "5 "$5; > print "6 "$6;print "7 "$7;print "8 "$8}'
Thanks for the small example. (I just read your last posting and will probably respond to it but this one was much easier.) > 1 > 2 one > 3 > 4 two > 5 three > 6 > 7 four > 8 five > > Doesn't seem very logical to me. Each field separator is splitting a field. So for example -F_ on "___" would delimit four fields. But before we do down this path I know what you want and we are going to do it differently to get there. > When awk successfully tests for space or colon, the following characters > are assumed NOT to be separators even if they have been defined as such > via the -F flag -- eg. the <space> that follows "one:" is mapped to the > $3 variable. > > Is this the way it's supposed to work? The way it is supposed to work is defined here: http://www.opengroup.org/onlinepubs/009695399/utilities/awk.html Search for the section "Regular Expressions" where the the FS ERE is discussed. An extended regular expression can be used to separate fields by using the -F ERE option or by assigning a string containing the expression to the built-in variable FS. The default value of the FS variable shall be a single <space>. The following describes FS behavior: 1. If FS is a null string, the behavior is unspecified. 2. If FS is a single character: a. If FS is <space>, skip leading and trailing <blank>s; fields shall be delimited by sets of one or more <blank>s. b. Otherwise, if FS is any other character c, fields shall be delimited by each single occurrence of c. 3. Otherwise, the string value of FS shall be considered to be an extended regular expression. Each occurrence of a sequence matching the extended regular expression shall delimit fields. As you can see the default splitting behavior on a single space is done as a one-off special. The space is different than any other field separator. What you probably want is option 3 above where the field separator is an extended regular expression. Try this: echo " one: two:three :four five" | awk -F "[: ]+" '{print "1 "$1; print "2 "$2; print "3 "$3; print "4 "$4; print "5 "$5; print "6 "$6;print "7 "$7;print "8 "$8}' 1 2 one 3 two 4 three 5 four 6 five 7 8 The -F"[: ]+" has a "+" now and will match one or more occurrences of either character. But there is still a difference because leading field separators are not trimmed. There are a couple of ways of dealing with that but neither are particularly elegant. echo " one: two:three :four five" | awk -F "[: ]+" '{sub(FS,"",$0);print "1 "$1; print "2 "$2; print "3 "$3; print "4 "$4; print "5 "$5; print "6 "$6;print "7 "$7;print "8 "$8}' 1 one 2 two 3 three 4 four 5 five 6 7 8 This does a substitution across the line for the FS variable. That is the same as sub(/[: ]+/,"",$0); here but using FS ties it to -F nicely. The $0 can be omitted in this but I like to be explicit. Hope this helps, Bob