Thanks Peter,

I made a mistake and took a repeat masked contigs instead of the original 
contigs, and they indeed had Ns. Sorry for the mess (still, I am looking for an 
option where Ns are not be included in the ORF).

Avi



----- Original Message ----
From: Peter Rice <[email protected]>
To: Fungazid <[email protected]>
Cc: [email protected]
Sent: Tue, January 12, 2010 4:15:28 PM
Subject: Re: [EMBOSS] getorf includes unspecified amino acids as part of the 
ORF sequence

Hi Avi,

> The input is a simple fasta file with only A,C,T,G letters and
> nothing else, so I wouldn't expect any Xs. In addition, even if there
> would be Ns (and there are no Ns) the program cannot know if such Ns
> do not include stopcodons so it should not consider them as part of an ORF.

>>> 00001_3 [803 - 1120]
>> LARLRFVVLGNSFIASAKGWSTPYGPTTFGPFRSCIYPRVFRSTRVRKAMATRIGSNRVN
>> ILIRCTXXXXXXXXXXXXXXXXXXXXXXXXXNPYLGWWCYIFCIFR

That suggests the Xs have all come from stop codons.

There are other possibilities, including a badly formatted input file
(perhaps two sequences and descriptions read as one).

We do need to see the input file to know where those Xs are from.

Peter Rice



      


_______________________________________________
EMBOSS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/emboss

Reply via email to