Hi Annette, Your seq1 is incorrectly guessed to be a nucleotide sequence, since you state it's protein. EMBOSS provides a boolean to state nucleotide or protein nature of your sequence, see EMBOSS help:
"-sequence" associated qualifiers -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein regards, bernd On 4/23/07, Becher, Anette <[EMAIL PROTECTED]> wrote: > Hi all, > > I believe I *may* have found a bug in compseq. > > I have been using compseq to calculate the frequency of amino acids in > translated DNA sequences. I find that frequently compseq takes the amino > acid sequence to be DNA (they are sequences with an unusual composition, > but then I am looking for odd proteins). So instead of the expected > output for all amino acids with most being zero, I often get output for > A,C,G,T and 'other'. I cannot see an obvious pattern that would explain > this behaviour, but maybe you can help. > > Command line: > > compseq -seq compseq_bug.in -word 1 -frame 1 -out compseq_bug.out > > An example input and output file are pasted in below - I can provide > many more. > > It might help if the user could specify whether the input sequence is > DNA or protein, rather than the program working it out somehow? > > > Best wishes > > > Anette > > > > Here is an example of the problem: > > > >Seq1 > GSGGGGGSGGRGMGGWGGGRGSGVGGRGWGVG > > > # > # Output from 'compseq' > # > # Only words in frame 1 will be counted. > # The Expected frequencies are calculated on the (false) assumption that > every > # word has equal frequency. > # > # The input sequences are: > # Seq1 > > > Word size 1 > Total count 31 > > # > # Word Obs Count Obs Frequency Exp Frequency Obs/Exp > Frequency > # > A 0 0.0000000 0.2500000 0.0000000 > C 0 0.0000000 0.2500000 0.0000000 > G 20 0.6451613 0.2500000 2.5806452 > T 0 0.0000000 0.2500000 0.0000000 > > Other 11 0.3548387 0.0000000 > 10000000000.0000000 > > > > > Here is a similar sequence that works fine: > > > >Seq2 > VGSEGGGGGRRGEGGGGGGRGGGGGRWEEGAG > > > > # > # Output from 'compseq' > # > # Only words in frame 1 will be counted. > # The Expected frequencies are calculated on the (false) assumption that > every > # word has equal frequency. > # > # The input sequences are: > # Seq2 > > > Word size 1 > Total count 31 > > # > # Word Obs Count Obs Frequency Exp Frequency Obs/Exp > Frequency > # > A 1 0.0322581 0.0476190 0.6774194 > C 0 0.0000000 0.0476190 0.0000000 > D 0 0.0000000 0.0476190 0.0000000 > E 4 0.1290323 0.0476190 2.7096774 > F 0 0.0000000 0.0476190 0.0000000 > G 20 0.6451613 0.0476190 13.5483871 > H 0 0.0000000 0.0476190 0.0000000 > I 0 0.0000000 0.0476190 0.0000000 > K 0 0.0000000 0.0476190 0.0000000 > L 0 0.0000000 0.0476190 0.0000000 > M 0 0.0000000 0.0476190 0.0000000 > N 0 0.0000000 0.0476190 0.0000000 > P 0 0.0000000 0.0476190 0.0000000 > Q 0 0.0000000 0.0476190 0.0000000 > R 4 0.1290323 0.0476190 2.7096774 > S 1 0.0322581 0.0476190 0.6774194 > T 0 0.0000000 0.0476190 0.0000000 > U 0 0.0000000 0.0476190 0.0000000 > V 0 0.0000000 0.0476190 0.0000000 > W 1 0.0322581 0.0476190 0.6774194 > Y 0 0.0000000 0.0476190 0.0000000 > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > EMBOSS mailing list > [email protected] > http://lists.open-bio.org/mailman/listinfo/emboss > _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
