RE: Sorting Hashes of Hashes

King, Jason G Thu, 13 Mar 2003 16:35:19 -0800

Basil writes..

>Maybe everyone else understands how Randal's code works but I assume
>he won some award because the average guy wouldn't have though of the
>method. And I don't like relying on tricks... OK maybe its not a trick
>but when its magic to me its a trick.


It's really worth understanding, because yes - everything else is
slower. What better place to learn than a mailing list :)

I'll get rid of the nested hash stuff and do a simple example, because
once you "get it" you'll have an "Aha!" moment and then you'll swear by
it in a heap of different sorting situations.

Ok, so take this hash:

  my %dates = (third => '12-01-2003', first => '4-4-2001', second =>
'5-23-2002');

Obviously they're in m-d-yyyy format, where sometimes there are leading
zeroes. So, first I'll show the whole Schwartzian Transform for sorting
them:

  my @sorted_keys = map { $_->[1] }
                    sort { $a->[0] <=> $b->[0] }
                    map { my($m,$d,$y) = split /-/, $dates{$_};
                          [ sprintf(%04d%02d%02d,$y,$m,$d), $_ ]
                        }
                    keys %dates;

Ok, so we work from the right to left (bottom up), you know what keys
%dates does, just generates an unordered list of keys from the hash. So
that's fed into the bottom map:


  map { my($m,$d,$y) = split /-/, $dates{$_};
        [ sprintf(%04d%02d%02d,$y,$m,$d), $_ ]
      }

So, what's this doing? Well it receives each key from keys %dates in $_.
So it's first doing the split for each $dates{$_}, and then it's
creating an anonymous array which contains some sprintf value and $_
(the key). So that array will look like this for $dates{third}:

  [ 20031201, 'third' ]

And for the other two keys in %dates we get:

  [ 20010404, 'first' ]
  [ 20020523, 'second' ]

So, those are the three return values of the map, for each key in %dates
it creates an anonymous array with the first element as the munged date,
and the second element as the original key from %dates. We specifically
munged the date to make sorting as easy, and hence as quick, as
possible. So, those anonymous arrays (one array per hash key) are passed
into the sort algorithm, it will do it's sorting setting elements passed
into it as $a and $b. So we want the sort to be performed on the first
element of the @$a and @$b arrays, ie. our sorting becomes:

  sort { $a->[0] <=> $b->[0] }

The first element is our munged date, that when sorted in simple
numerical order will give accurate results because obviously 20,010,404
is a smaller number than 20,020,523 which in turn is smaller than
20,031,201.

So, the sort correctly outputs the elements passed in to it (the
anonymous arrays) in the order it worked out, so the sort outputs:

  [ 20010404, 'first' ]
  [ 20020523, 'second' ]
  [ 20031201, 'third' ]

Which are received by the last map, the topmost map. It receives each
anonymous array as $_, and all it does is dereference the second element
of whatever is passed to it:

  map { $_->[1] }

So, it just outputs 'first', 'second' and then 'third'. Which are stored
in @sorted_keys. We're able to throw away the $_->[0] because we only
wanted that for sorting.

Let's look at another example. Imagine that you have this hash of
arrays:

  my %vals = ( third => [4,5,23433], first => [3,18,322], second =>
[4,2,233] );

What we want is to do is sort first on the first field of the arrays,
then on the second field, and then on the third field. But we want to
maintain those three fields in our output. Without transforming the
arrays for sorting you're looking at nested for loops going over all the
elements and sorting on each one in a nested or recursive fashion. It
takes an enormous amount of time whenever the number of hash elements or
the number of array elements increases. With a transform it becomes:

  my @sorted_keys = map { $_->[1] }
                    sort { $a->[0] <=> $b->[0] }
                    map { [ sprintf('%02d%02d%08d',@{$vals{$_}}), $_ ] }
                    keys %vals;

So much quicker, so much less code. The concept is just to transform the
input so that it's easier to sort, then sort it with a simpler sort,
then transform it back to the original. It's worth working through this,
it can save you A LOT of time and code.

-- 
  Jason King
_______________________________________________
ActivePerl mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: Sorting Hashes of Hashes

Reply via email to