Dear People,
I am studying some topics on Bioinformatics and since I read a lot about Perl being employed in the area, I tried programming a very basic algorithm with it, the pair-wise sequence alignment algorithm. The problem is that while the very same algorithm runs quite satisfactorily with the same input data, the algorithm that I coded in Perl is about 40 times slower (I didn't test it with bigger instances, unfortunately). Since I am a newbie Perl programmer, I may be doing something utterly wrong in my program, some big "no-nos" and I would appreciate if somebody could point me to problems with my code, which I am sending below. In my code, I tried using only integers to make computations (so that it could run a bit faster) and to disable warning, but while the former did have a slight impact in the running time of my program, the latter didn't do anything noticeable. I also tried to "pre-allocate" (i.e., pre-extend) the arrays that I am using, with the hope that it would avoid Perl doing lots of reallocs() (if Perl uses it at all -- I'm guessing here), but the time taken also didn't change (or the change was negligible). Here is the program. As the input, it takes two sequences that it tries to "align" and calculate their edit distance (i.e., the least amount of inserts, deletes, and character substitutions needed to change the first sequence into the second sequence). Please disregard the comments (in Portuguese, intended for my advisor). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - #!/usr/bin/perl -w use strict; use integer; sub p($$) { return ($_[0] eq $_[1])?0:1; } sub min($$$) { my ($min); $min = ($_[0] <= $_[1])?$_[0]:$_[1]; $min = ($min <= $_[2])?$min :$_[2]; return $min; } my ($debug) = (0); my ($s, $t, @s, @t); # sequencias a alinhar; open(ENTRA, "seqs.txt"); chomp($s = <ENTRA>); #colocar verificacao de erros chomp($t = <ENTRA>); #colocar entrada de acordo com o padrao FASTA close(ENTRA); # truque para facilitar acesso rapido a caracteres de $s e $t # tambem elimina brancos indesejados -- posicao 0 inutilizada em $s e $t @s = ("", split(/ */, $s)); @t = ("", split(/ */, $t)); my ($m, $n) = (scalar(@s)-1, scalar(@t)-1); my ($i, $j); # "aloca" matriz @a para evitar que o Perl faca realocacoes my (@a); # matriz de prog. dinamica $#a = $m; # linhas de 0 a $m for ($i = 0; $i <= $m; $i++) { $a[$i]->[$n] = 0; # equivale a $a[$i] = malloc($n+1) } # inicializa matriz @a for ($i = 0; $i <= $m; $i++) { $a[$i][0] = $i; } for ($j = 0; $j <= $n; $j++) { $a[0][$j] = $j; } # preenche a matriz @a for ($i = 1; $i <= $m; $i++) { for ($j = 1; $j <= $n; $j++) { $a[$i][$j] = min($a[$i-1][$j] + p($s[$i], "-"), $a[$i-1][$j-1] + p($s[$i], $t[$j]), $a[$i][$j-1] + p("-", $t[$j])); } } if ($debug) { for ($i = 0; $i <= $m; $i++) { for ($j = 0; $j <= $n; $j++) { print "$a[$i][$j] "; } print "\n"; } } # imprime o alinhamento $i = $m; $j = $n; my (@alin_s, @alin_t, @alin_c); while ($i > 0 and $j > 0) { if ($a[$i][$j] == $a[$i-1][$j] + p($s[$i], "-")) { push(@alin_s, $s[$i]); push(@alin_t, "-"); push(@alin_c, " "); $i--; } elsif ($a[$i][$j] == $a[$i-1][$j-1] + p($s[$i], $t[$j])) { push(@alin_s, $s[$i]); push(@alin_t, $t[$j]); push(@alin_c, ($s[$i] eq $t[$j])?"*":" "); $i--; $j--; } else { push(@alin_s, "-"); push(@alin_t, $t[$j]); push(@alin_c, " "); $j--; } } while ($j > 0) { # seq. $s acabou push(@alin_s, "-"); push(@alin_t, $t[$j]); push(@alin_c, " "); $j--; } while ($i > 0) { # seq. $t acabou push(@alin_s, $s[$i]); push(@alin_t, "-"); push(@alin_c, " "); $i--; } print reverse(@alin_s), "\n"; print reverse(@alin_t), "\n"; print reverse(@alin_c), "\n"; print "dist(s, t) = $a[$m][$n]\n"; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - So, any help will be more than welcome. Thanks in advance, Roger... -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Rogério Brito - [EMAIL PROTECTED] - http://www.ime.usp.br/~rbrito/ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]