> ok, seriously, we'll also assume no two entries have
> the same number, and if
> they did you'd want to delete repeats. This makes
> things a lot easier.
> 
> #! perl
> 
> open FILE, 'file.txt';
> @list = <FILE>;   # get list into array or by some
> other means
>                   # keep the line breaks if you can
> for $i (0..$#list) {
>   $list[$i] =~ /(....)(\d*)/
>   $sortedlist[$2] = $list[$i];  # create each
> element of the
>                                 # new list
> }
> print @sortedlist;
> 
> The nice thing is if you have gaps in the array
> (e.g. elements 2,3,4 exist
> but 5-83 don't) it really won't matter.

It does matter a little in that if you have big gaps
in the array (e.g., @list = ("exon1", "exon3908239")),
you end up creating a huge array that stores only a
few elements.

A hash-style approach may be a little more efficient
(so we don't waste buckets) while still maintaining
the feature that avoids multiple entries:

sub sort_custom {
 my(%sorted);
  for(@_) {
   /(\d+)$/;
   $sorted{$1} = $_;
  }
  return map {$sorted{$_}} sort {$a<=>$b} keys
%sorted;
}

However, using Benchmark on my DV iMac 400, it appears
that all our approaches are roughly equivalent (with
my hash-style approach sadly coming up dead last):

use Benchmark;

my @list = qw(
        exon1
        exon5
        exon12
        exon30
        exon2
);

timethese(100000, {
        'sorted_with_custom' => sub { @ary =
sort_custom(@list) },
        ' sorted_with_array' => sub { @ary = sort by_exon_num
@list },
        'sorted_by_exon_num' => sub { @ary =
sort_custom(@list) }
} );

sub sort_custom {
        my(%sorted);
        for(@_) {
                /(\d+)$/;
                $sorted{$1} = $_;
        }
        return map { $sorted{$_} } sort { $a <=> $b } keys
%sorted;
}

sub by_exon_num {
        $a =~ /(\d+)/;
        my $a_dig = $1;
        $b =~ /(\d+)/;
        my $b_dig = $1;
        $a_dig <=> $b_dig;
}

sub sort_array {
        my(@sortedlist);
        for my $i(0..$#list) {
                $list[$i] =~ /(....)(\d*)/;
                $sortedlist[$2] = $list[$i];
        }
        return @sortedlist;
}

With results as follows:

Benchmark: timing 100000 iterations of 
sorted_with_array, sorted_by_exon_num,
sorted_with_custom...
 sorted_with_array: 18 secs (17.45 usr  0.00 sys =
17.45 cpu)
sorted_by_exon_num: 19 secs (19.03 usr  0.00 sys =
19.03 cpu)
sorted_with_custom: 20 secs (19.33 usr  0.00 sys =
19.33 cpu)

> I'm sure it can get really complicated if you have
> many different combos of
> letters at the beginning.  But if you can separate
> those out into separate
> lists then run the subroutine over each of them,
> that'll do it.

Well, even if the number of leading characters is
variable, simply catching the trailing digits (i.e.,
using /(.\d+)$/) should elminate any unnecessary
complexity that stems from that problem.

I sure had a lot of time on my hands today :-)

Regards,
David

>----- Original Message -----
> 
> Hi,
> 
> I am trying to sort a list like this
> 
> exon1
> exon5
> exon12
> exon30
> exon2
> 
> Into ->
> 
> exon1
> exon2
> exon5
> exon12
> exon30
> 
> Any ideas on how to do this?
> 
> Thanks
> 
> adam

__________________________________________________
Do You Yahoo!?
Make international calls for as low as $.04/minute with Yahoo! Messenger
http://phonecard.yahoo.com/

Reply via email to