A quick background overview. I have a relatively simple tool for
generating six-frame translations of a genome (all possible protein
segments encoded in a genome). I read through the input one codon (3 DNA
bases) at a time, keeping the previous codon, and build the sequence
fragments in six separate scalars. Every codon I check to see if any of
the fragments end in a 'stop' codon, and if so call a function that
checks if the fragment is >= the min length and if so formats the output
and returns that. The scalar containing the fragment is passed by
reference and 'cleared' ( $$sref = '') in the stop function. The
formated output, if any, is accumulated in a temp variable and, if
anything was returned, the contents are output. The original way I
handled this caused, seemingly at random, the output of fragments to be
repeated over a number of iterations, but after a small change the
problem went away, although I would have thought the change to be
equivilent. Here's a stripped down version of the first method:

my ($seq1,$cseq1,...) = '';
while($codon = &$read_code) {
        my $six = $last . $codon;
        $last = $codon;
        $pos+=3;
        
        $seq1 .= $AA{$codon} || "X";
        complement(\$codon);
        $cseq1 .= $AA{$codon} || "X";
        # etc for other two frames
        
        my $out = _stop(\$seq1,$pos,$minlen,$id) if $seq1 =~ /\.$/;
        $out .= _stop(\$cseq1,$pos,$minlen,$id,$dna_len)
                if $cseq1 =~ /\.$/;
        # etc for all frames

        print $fh_out $out if $out;
} # while reading codons from genome

Key lines in _stop:
if ($len < $min) { undef($$sref_seq);return '' }
$ret = "$id\t$$sref_seq\t$start\t$end\n";
$$sref_seq = '';
return $ret;


The above code causes the same fragment to be output multiple times.
Some debugging has concluded that _stop is only being called when
expected and that $seq1 etc. are being set to empty string as intended.
I also determined that the repeat was being printed in sucsesive
iterations of the while loop, only as long as a return from _stop didn't
assign a new value other than '' to $out. For example, if I output a
simple loop counter, then I would see somthing like:
141:
SixFrame       <sequence>       278     141
142:
SixFrame       <sequence>       278     141
143:
SixFrame       <sequence>       278     141
144:
145:
146:
etc., where only the output on iteration 141 was expected.

Where the 'SixFrame' line is the content of $out. There doesn't seem to
be any obvious pattern to how many times the output would be repeated,
but I didn't bother to investigate that deeply. Changing the "my $out =
_stop" line in the while loop above to:
        my $out = '';
        $out .= _stop(\$seq1,$pos,$minlen,$id) if $seq1 =~ /\.$/;

Seems to have completely solved the problem. Is this some sort of
mistake on my part, some subtle/odd behavior that would cause this to be
expected in this usage (and if so please explain), or should I report
this as a bug [no, I haven't trolled the known bugs/fixes yet, sorry]

285: 11:45am % uname -a
Linux xxxx.xxx.xxx 2.4.20-43.9.legacysmp #1 SMP Tue Apr 26 08:08:36 EDT
2005 i686 athlon i386 GNU/Linux
286: 11:45am % perl -v

This is perl, v5.8.6 built for i686-linux-thread-multi
...

Thanks!
-- 
Sean Quinlan <[EMAIL PROTECTED]>

Attachment: signature.asc
Description: This is a digitally signed message part

 
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to