On Mon, 21 May 2007, Matthew Woehlke wrote:

I thought about that, but /maximum/ efficiency is only achievable doing everything in one go. Anyway I think 'countitems' would still be a big improvement; I would do that as 'sort --unique-with-count' (preferably aliased 'sort -U') since IMO this is a missing feature of 'sort -u'.

You don't really want to do the first sort at all - it's just a convenient way of creating the buckets. The relative order of each bucket is unimportant, but that's what sort spends a long time calculating.

A fundamentally more efficient approach would be something like:

perl -lne '$bucket{$_}++; END { foreach $key (keys %bucket) { print "$bucket{$key} 
$key" } }' | \
  sort -n

The trailing "sort" could be done inside perl, but it doesn't help the (algorithmic) efficiency, and we're not playing perl golf...


Cheers,
Phil


_______________________________________________
Bug-coreutils mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Reply via email to