Re: efficient version of 'sort | uniq -c | sort -n'?

Philip Rowlands Mon, 21 May 2007 13:48:20 -0700

On Mon, 21 May 2007, Matthew Woehlke wrote:

I thought about that, but /maximum/ efficiency is only achievabledoing everything in one go. Anyway I think 'countitems' would still bea big improvement; I would do that as 'sort --unique-with-count'(preferably aliased 'sort -U') since IMO this is a missing feature of'sort -u'.

You don't really want to do the first sort at all - it's just aconvenient way of creating the buckets. The relative order of eachbucket is unimportant, but that's what sort spends a long timecalculating.


A fundamentally more efficient approach would be something like:

perl -lne '$bucket{$_}++; END { foreach $key (keys %bucket) { print "$bucket{$key} 
$key" } }' | \
  sort -n

The trailing "sort" could be done inside perl, but it doesn't help the(algorithmic) efficiency, and we're not playing perl golf...



Cheers,
Phil


_______________________________________________
Bug-coreutils mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Re: efficient version of 'sort | uniq -c | sort -n'?

Reply via email to