Thanks Peter, I made a mistake and took a repeat masked contigs instead of the original contigs, and they indeed had Ns. Sorry for the mess (still, I am looking for an option where Ns are not be included in the ORF).
Avi ----- Original Message ---- From: Peter Rice <[email protected]> To: Fungazid <[email protected]> Cc: [email protected] Sent: Tue, January 12, 2010 4:15:28 PM Subject: Re: [EMBOSS] getorf includes unspecified amino acids as part of the ORF sequence Hi Avi, > The input is a simple fasta file with only A,C,T,G letters and > nothing else, so I wouldn't expect any Xs. In addition, even if there > would be Ns (and there are no Ns) the program cannot know if such Ns > do not include stopcodons so it should not consider them as part of an ORF. >>> 00001_3 [803 - 1120] >> LARLRFVVLGNSFIASAKGWSTPYGPTTFGPFRSCIYPRVFRSTRVRKAMATRIGSNRVN >> ILIRCTXXXXXXXXXXXXXXXXXXXXXXXXXNPYLGWWCYIFCIFR That suggests the Xs have all come from stop codons. There are other possibilities, including a badly formatted input file (perhaps two sequences and descriptions read as one). We do need to see the input file to know where those Xs are from. Peter Rice _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
