Re: Specifying multiple separators via FS or the -F command line flag - addendum

cga2000 Tue, 04 Dec 2007 17:06:25 -0800

On Mon, Dec 03, 2007 at 11:32:02PM EST, Bob Proulx wrote:
> cga2000 wrote:
> > Here's a sample of how the multiple separators feature behaves:
> > 
> > [15:52:[EMAIL PROTECTED]:~]$ echo " one: two:three :four five" | awk -F "[: 
> > ]" '{print "1 "$1; print "2 "$2; print "3 "$3; print "4 "$4; print "5 "$5; 
> > print "6 "$6;print "7 "$7;print "8 "$8}'
> 
> Thanks for the small example.  (I just read your last posting and will
> probably respond to it but this one was much easier.)
> 
> > 1
> > 2 one
> > 3
> > 4 two
> > 5 three
> > 6
> > 7 four
> > 8 five
> > 
> > Doesn't seem very logical to me.


Maybe I meant "intuitive" .. except that this overloaded term has become
such as private joke where I'm concerned that I tend to instinctively
avoid it .. :-)

Intuition means that in very common situations where you're parsing
text--and since the default FS is <space> .. it would seem rather
"natural" to default to a behavior where two or three or even four
spaces .. e.g. ..  only count as one separator.

??

> Each field separator is splitting a field.  So for example -F_ on
> "___" would delimit four fields.  But before we do down this path I
> know what you want and we are going to do it differently to get there.
> 
> > When awk successfully tests for space or colon, the following characters
> > are assumed NOT to be separators even if they have been defined as such
> > via the -F flag -- eg. the <space> that follows "one:" is mapped to the
> > $3 variable.
> > 
> > Is this the way it's supposed to work?
> 
> The way it is supposed to work is defined here:
> 
>   http://www.opengroup.org/onlinepubs/009695399/utilities/awk.html
> 
> Search for the section "Regular Expressions" where the the FS ERE is
> discussed.
> 
> An extended regular expression can be used to separate fields by using
> the -F ERE option or by assigning a string containing the expression
> to the built-in variable FS. The default value of the FS variable
> shall be a single <space>. The following describes FS behavior:
> 
>    1. If FS is a null string, the behavior is unspecified.
>    2. If FS is a single character:
>          a. If FS is <space>, skip leading and trailing <blank>s;
>             fields shall be delimited by sets of one or more <blank>s.
>          b. Otherwise, if FS is any other character c, fields shall be
>             delimited by each single occurrence of c.
>    3. Otherwise, the string value of FS shall be considered to be an
>       extended regular expression. Each occurrence of a sequence
>       matching the extended regular expression shall delimit fields.
> 
> As you can see the default splitting behavior on a single space is
> done as a one-off special.  The space is different than any other
> field separator.

Quite "logical".

I am not a programmer and have very little time to dedicate to the *nix
playground.  So when I have to, I grab the first online tutorial that
makes sense and try to make the language work for me.  

Otherwise with maybe 6-8 hours a week devoted to computing in general, I
would get nowhere.

> What you probably want is option 3 above where the field separator is
> an extended regular expression.  Try this:
> 
>   echo " one: two:three :four five" | awk -F "[: ]+" '{print "1 "$1; print "2 
> "$2; print "3 "$3; print "4 "$4; print "5 "$5; print "6 "$6;print "7 
> "$7;print "8 "$8}'
>   1 
>   2 one
>   3 two
>   4 three
>   5 four
>   6 five
>   7 
>   8 

> The -F"[: ]+" has a "+" now and will match one or more occurrences of
> either character.  

I like that.

> But there is still a difference because leading field separators are
> not trimmed.  

But this doesn't make sense .. 

I mean .. "-F [: ]+" tells awk that "    " eg. is a separator .. so
something like "  : :   " should be one big separator & should become
part of the implicit "beginning of line" separator, no ..??

As a result something like:

  :  ::   f1 f2 f3

.. should have strings "f1" "f2" "f3" map to $1 $2 $3. 

??

> There are a couple of ways of
> dealing with that but neither are particularly elegant.
> 
>   echo " one: two:three :four five" | awk -F "[: ]+" '{sub(FS,"",$0);print "1 
> "$1; print "2 "$2; print "3 "$3; print "4 "$4; print "5 "$5; print "6 
> "$6;print "7 "$7;print "8 "$8}'
>   1 one
>   2 two
>   3 three
>   4 four
>   5 five
>   6 
>   7 
>   8 
> 
> This does a substitution across the line for the FS variable.  That is
> the same as sub(/[: ]+/,"",$0); here but using FS ties it to -F
> nicely.  The $0 can be omitted in this but I like to be explicit.

> Hope this helps,

So little time .. too much stuff ..

Re: Specifying multiple separators via FS or the -F command line flag - addendum

Reply via email to