On 02/07/2013 05:13 PM, Assaf Gordon wrote:
Hello,

Attached is a proof-of-concept patch to add "--check-fields=N" to uniq, 
allowing uniq'ing by specific fields.
(Trying a different approach at promoting csplit-by-field [1] :) ).

It works just like 'check-chars' but on fields, and if not used, it does not 
affect the program flow.
===
     # input file, every whole-line is uniq
     $ cat input.txt
     A 1 z
     A 1 y
     A 2 x
     B 2 w
     B 3 w
     C 3 w
     C 4 w

     # regular uniq
     $ uniq -c input.txt
           1 A 1 z
           1 A 1 y
           1 A 2 x
           1 B 2 w
           1 B 3 w
           1 C 3 w
           1 C 4 w

     # Stop after 1 field
     $ uniq -c --check-fields 1 input.txt
           3 A 1 z
           2 B 2 w
           2 C 3 w

     # Stop after 2 fields
     $ uniq -c --check-fields 2 input.txt
           2 A 1 z
           1 A 2 x
           1 B 2 w
           1 B 3 w
           1 C 3 w
           1 C 4 w

     # Skip the first field and check 1 field (effectively, uniq on field 2)
     $ uniq -c  --skip-fields 1 --check-fields 1 input.txt
           2 A 1 z
           2 A 2 x
           2 B 3 w
           1 C 4 w

     # "--field" is convenience shortcut for skip&check fields
     $ uniq -c --field 2 input.txt
           2 A 1 z
           2 A 2 x
           2 B 3 w
           1 C 4 w
     $ uniq -c --field 3 input.txt
           1 A 1 z
           1 A 1 y
           1 A 2 x
           4 B 2 w
===

What do you think ?

Useful, but only a partial solution as discussed here:

http://lists.gnu.org/archive/html/bug-coreutils/2006-06/msg00211.html
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=5832

I.E. essentially this patch has been rejected before,
and being able to specify --key to uniq just like sort,
would be much preferred.

To avoid redundant coding it's always good to
touch base with the list first on ideas,
or search the bug database.

cheers,
Pádraig

Reply via email to