thanks for you reply. if i want to use the entire line as a key, and sort by the third field, whether should i use sort -u -k3 -k1 -k2 a to do that?
On Wed, Nov 9, 2011 at 03:45, Eric Blake <[email protected]> wrote: > On 11/08/2011 11:54 AM, Eric Blake wrote: >>> >>> 22:41:39#tp#~> /usr/local/bin/sort -u -k1,3 a >>> 1 a q >>> 1 a w >>> 3 a w >>> 22:41:48#tp#~> /usr/local/bin/sort -u -k3 a >>> 1 a q >>> 1 a w > >> Since you didn't tell us what output you were hoping to get, I can't >> tell you the proper command line that would match your expected output. >> Feel free to reply, even while this bug is closed, if you need more help >> in getting the output you want. > > I'll give a preemptive attempt at guessing what you meant, as well: > > If you wanted to sort on just the third and subsequent fields, but then > strip duplicate lines only if the entire line is duplicate, then you have to > use two processes: > > sort [-s] -k3 a | uniq > > If you don't mind a two-key sort, where the primary key is the third and > subsequent fields, but where the secondary key is the entire line so as to > force sort -u to consider the entire line when determining uniqueness, then > one process will do: > > sort -u -k3 -k1 a > > To see the difference, and remembering that sort -u implies sort -s, > consider these contents for a: > > $ cat a > 1 a q > 2 a q > 1 a q > 1 a w > 3 a w > $ sort -u -k3 -k1 a > 1 a q > 2 a q > 1 a w > 3 a w > $ sort -s -k3 a | uniq > 1 a q > 2 a q > 1 a q > 1 a w > 3 a w > $ sort -k3 a | uniq > 1 a q > 2 a q > 1 a w > 3 a w > > That is, if the stable sort of just -k3 leaves identical lines that are not > adjacent ("1 a q" in my example), then the separate uniq process won't > filter them; while using sort -u with -k1 as the means to force the entire > line as a secondary sort key loses the ability to leave identical lines > separated by a distinct line. Likewise, omitting both -s and -u lets sort > imply a last-resort -k1, at which point uniq sees the same line order as > sort -u sees. > >>> i read >>> http://www.gnu.org/s/coreutils/manual/html_node/sort-invocation.html, >>> but got nothing about this. > > Actually, it does - under the option -u, I see: > > The commands sort -u and sort | uniq are equivalent, but this equivalence > does not extend to arbitrary sort options. For example, sort -n -u inspects > only the value of the initial numeric string when checking for uniqueness, > whereas sort -n | uniq inspects the entire line. See uniq invocation. > > -- > Eric Blake [email protected] +1-801-349-2682 > Libvirt virtualization library http://libvirt.org > -- contact me: MSN: [email protected] GTALK: [email protected]
