> ok, seriously, we'll also assume no two entries have > the same number, and if > they did you'd want to delete repeats. This makes > things a lot easier. > > #! perl > > open FILE, 'file.txt'; > @list = <FILE>; # get list into array or by some > other means > # keep the line breaks if you can > for $i (0..$#list) { > $list[$i] =~ /(....)(\d*)/ > $sortedlist[$2] = $list[$i]; # create each > element of the > # new list > } > print @sortedlist; > > The nice thing is if you have gaps in the array > (e.g. elements 2,3,4 exist > but 5-83 don't) it really won't matter. It does matter a little in that if you have big gaps in the array (e.g., @list = ("exon1", "exon3908239")), you end up creating a huge array that stores only a few elements. A hash-style approach may be a little more efficient (so we don't waste buckets) while still maintaining the feature that avoids multiple entries: sub sort_custom { my(%sorted); for(@_) { /(\d+)$/; $sorted{$1} = $_; } return map {$sorted{$_}} sort {$a<=>$b} keys %sorted; } However, using Benchmark on my DV iMac 400, it appears that all our approaches are roughly equivalent (with my hash-style approach sadly coming up dead last): use Benchmark; my @list = qw( exon1 exon5 exon12 exon30 exon2 ); timethese(100000, { 'sorted_with_custom' => sub { @ary = sort_custom(@list) }, ' sorted_with_array' => sub { @ary = sort by_exon_num @list }, 'sorted_by_exon_num' => sub { @ary = sort_custom(@list) } } ); sub sort_custom { my(%sorted); for(@_) { /(\d+)$/; $sorted{$1} = $_; } return map { $sorted{$_} } sort { $a <=> $b } keys %sorted; } sub by_exon_num { $a =~ /(\d+)/; my $a_dig = $1; $b =~ /(\d+)/; my $b_dig = $1; $a_dig <=> $b_dig; } sub sort_array { my(@sortedlist); for my $i(0..$#list) { $list[$i] =~ /(....)(\d*)/; $sortedlist[$2] = $list[$i]; } return @sortedlist; } With results as follows: Benchmark: timing 100000 iterations of sorted_with_array, sorted_by_exon_num, sorted_with_custom... sorted_with_array: 18 secs (17.45 usr 0.00 sys = 17.45 cpu) sorted_by_exon_num: 19 secs (19.03 usr 0.00 sys = 19.03 cpu) sorted_with_custom: 20 secs (19.33 usr 0.00 sys = 19.33 cpu) > I'm sure it can get really complicated if you have > many different combos of > letters at the beginning. But if you can separate > those out into separate > lists then run the subroutine over each of them, > that'll do it. Well, even if the number of leading characters is variable, simply catching the trailing digits (i.e., using /(.\d+)$/) should elminate any unnecessary complexity that stems from that problem. I sure had a lot of time on my hands today :-) Regards, David >----- Original Message ----- > > Hi, > > I am trying to sort a list like this > > exon1 > exon5 > exon12 > exon30 > exon2 > > Into -> > > exon1 > exon2 > exon5 > exon12 > exon30 > > Any ideas on how to do this? > > Thanks > > adam __________________________________________________ Do You Yahoo!? Make international calls for as low as $.04/minute with Yahoo! Messenger http://phonecard.yahoo.com/