I report a feature of uniq which seems IMHO to be a bug:
I am using test files containing the following lines:

tsttmp1:
2/dl1/f01             lnk2/f01              Benvenue house, Kat in blue dress 
on back porch
2/dl1/f02             lnk2/f02              Palm Springs, CA ????
2/dl1/f03             lnk2/f03              Amerivox company picnic, Palo Alto, 
CA
2/dl1/f03a            lnk2/f03a             Amerivox company picnic, Palo Alto, 
CA
2/dl1/f04             lnk2/f04              Europe but where?
2/dl1/f04a            lnk2/f04              Europe but where?
2/dl1/f05             lnk2/f05              Carol and Faith trip to Spain, etc.
2/dl1/f06             lnk2/f06              Carol and Faith trip

tsttmp2:
2/dl1/f01             lnk2/f01              Benvenue house, Kat in blue dress 
on back porch
2/dl1/f02             lnk2/f02              Palm Springs, CA ????
2/dl1/f03             lnk2/f03              Amerivox company picnic, Palo Alto, 
CA
2/dl1/f03a            lnk2/f03a             Amerivox company picnic, Palo Alto, 
CA
2/dl1/f04            lnk2/f04              Europe but where?
2/dl1/f04a            lnk2/f04              Europe but where?
2/dl1/f05             lnk2/f05              Carol and Faith trip to Spain, etc.
2/dl1/f06             lnk2/f06              Carol and Faith trip
 
Note that both files contain a pair of lines having 'lnk2/f04' as the second 
field.
The space between fields in both files is strings of space characters. No tabs 
are
used.

I use the commands:
$ uniq -f 1 -W 1 -D tsttmp1
and
$ uniq -f 1 -W 1 -D tsttmp2

In both commands, the options call for examining _only_ field 2, and should 
report two
duplicate lines in both files. But not so. There is no report of duplicates for 
tsttmp1.
And there is a report of two duplicate lines for tsttmp2.

I believe that the actual program treats a field as beginning with the first 
blank
after a non-blank character. This behavior is the standard behavior for 'sort', 
but is
inconsistent with 'info coreutils uniq', which states that a field begins with 
the first
non-blank character after a string of blanks. What keeps there from being a 
report for
tsttmp1 is the differing number of leading blanks in the two lines.

I suggest a fix for this in uniq:
1/ change the documenatation to accurately describe the actual behavior.
2/ add an option, -b, to uniq that tells it to ignore leading blanks in a 
field, as is
   available in sort.

Cheers,
-- 
Paul E Condon           
[EMAIL PROTECTED]


_______________________________________________
Bug-coreutils mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Reply via email to