On Mon, Dec 03, 2007 at 11:32:02PM EST, Bob Proulx wrote: > cga2000 wrote: > > Here's a sample of how the multiple separators feature behaves: > > > > [15:52:[EMAIL PROTECTED]:~]$ echo " one: two:three :four five" | awk -F "[: > > ]" '{print "1 "$1; print "2 "$2; print "3 "$3; print "4 "$4; print "5 "$5; > > print "6 "$6;print "7 "$7;print "8 "$8}' > > Thanks for the small example. (I just read your last posting and will > probably respond to it but this one was much easier.) > > > 1 > > 2 one > > 3 > > 4 two > > 5 three > > 6 > > 7 four > > 8 five > > > > Doesn't seem very logical to me.
Maybe I meant "intuitive" .. except that this overloaded term has become such as private joke where I'm concerned that I tend to instinctively avoid it .. :-) Intuition means that in very common situations where you're parsing text--and since the default FS is <space> .. it would seem rather "natural" to default to a behavior where two or three or even four spaces .. e.g. .. only count as one separator. ?? > Each field separator is splitting a field. So for example -F_ on > "___" would delimit four fields. But before we do down this path I > know what you want and we are going to do it differently to get there. > > > When awk successfully tests for space or colon, the following characters > > are assumed NOT to be separators even if they have been defined as such > > via the -F flag -- eg. the <space> that follows "one:" is mapped to the > > $3 variable. > > > > Is this the way it's supposed to work? > > The way it is supposed to work is defined here: > > http://www.opengroup.org/onlinepubs/009695399/utilities/awk.html > > Search for the section "Regular Expressions" where the the FS ERE is > discussed. > > An extended regular expression can be used to separate fields by using > the -F ERE option or by assigning a string containing the expression > to the built-in variable FS. The default value of the FS variable > shall be a single <space>. The following describes FS behavior: > > 1. If FS is a null string, the behavior is unspecified. > 2. If FS is a single character: > a. If FS is <space>, skip leading and trailing <blank>s; > fields shall be delimited by sets of one or more <blank>s. > b. Otherwise, if FS is any other character c, fields shall be > delimited by each single occurrence of c. > 3. Otherwise, the string value of FS shall be considered to be an > extended regular expression. Each occurrence of a sequence > matching the extended regular expression shall delimit fields. > > As you can see the default splitting behavior on a single space is > done as a one-off special. The space is different than any other > field separator. Quite "logical". I am not a programmer and have very little time to dedicate to the *nix playground. So when I have to, I grab the first online tutorial that makes sense and try to make the language work for me. Otherwise with maybe 6-8 hours a week devoted to computing in general, I would get nowhere. > What you probably want is option 3 above where the field separator is > an extended regular expression. Try this: > > echo " one: two:three :four five" | awk -F "[: ]+" '{print "1 "$1; print "2 > "$2; print "3 "$3; print "4 "$4; print "5 "$5; print "6 "$6;print "7 > "$7;print "8 "$8}' > 1 > 2 one > 3 two > 4 three > 5 four > 6 five > 7 > 8 > The -F"[: ]+" has a "+" now and will match one or more occurrences of > either character. I like that. > But there is still a difference because leading field separators are > not trimmed. But this doesn't make sense .. I mean .. "-F [: ]+" tells awk that " " eg. is a separator .. so something like " : : " should be one big separator & should become part of the implicit "beginning of line" separator, no ..?? As a result something like: : :: f1 f2 f3 .. should have strings "f1" "f2" "f3" map to $1 $2 $3. ?? > There are a couple of ways of > dealing with that but neither are particularly elegant. > > echo " one: two:three :four five" | awk -F "[: ]+" '{sub(FS,"",$0);print "1 > "$1; print "2 "$2; print "3 "$3; print "4 "$4; print "5 "$5; print "6 > "$6;print "7 "$7;print "8 "$8}' > 1 one > 2 two > 3 three > 4 four > 5 five > 6 > 7 > 8 > > This does a substitution across the line for the FS variable. That is > the same as sub(/[: ]+/,"",$0); here but using FS ties it to -F > nicely. The $0 can be omitted in this but I like to be explicit. > Hope this helps, So little time .. too much stuff ..