Hi, Regarding compseq I wonder how to count words in reading frame 0 only. The frame values can be 0,1,2 for words of length 2. I use "AGAGAG" as sequence and 1 as frame. This results in 2 times GA. Using frame 2 results in two times AG. But how to get a count of 3 times AG only? Frame zero returns a count of 3 for AG, but also a count of 2 for GA.
I used emboss version 4.1.0 over the web with EMBOSS explorer. regards, bernd On 4/23/07, Bernd Web <[EMAIL PROTECTED]> wrote: > Hi Annette, > > Your seq1 is incorrectly guessed to be a nucleotide sequence, since > you state it's protein. EMBOSS provides a boolean to state nucleotide > or protein nature of your sequence, see EMBOSS help: > > "-sequence" associated qualifiers > -snucleotide1 boolean Sequence is nucleotide > -sprotein1 boolean Sequence is protein > > regards, > bernd > > On 4/23/07, Becher, Anette <[EMAIL PROTECTED]> wrote: > > Hi all, > > > > I believe I *may* have found a bug in compseq. > > > > I have been using compseq to calculate the frequency of amino acids in > > translated DNA sequences. I find that frequently compseq takes the amino > > acid sequence to be DNA (they are sequences with an unusual composition, > > but then I am looking for odd proteins). So instead of the expected > > output for all amino acids with most being zero, I often get output for > > A,C,G,T and 'other'. I cannot see an obvious pattern that would explain > > this behaviour, but maybe you can help. > > > > Command line: > > > > compseq -seq compseq_bug.in -word 1 -frame 1 -out compseq_bug.out > > > > An example input and output file are pasted in below - I can provide > > many more. > > > > It might help if the user could specify whether the input sequence is > > DNA or protein, rather than the program working it out somehow? > > > > > > Best wishes > > > > > > Anette > > > > > > > > Here is an example of the problem: > > > > > > >Seq1 > > GSGGGGGSGGRGMGGWGGGRGSGVGGRGWGVG > > > > > > # > > # Output from 'compseq' > > # > > # Only words in frame 1 will be counted. > > # The Expected frequencies are calculated on the (false) assumption that > > every > > # word has equal frequency. > > # > > # The input sequences are: > > # Seq1 > > > > > > Word size 1 > > Total count 31 > > > > # > > # Word Obs Count Obs Frequency Exp Frequency Obs/Exp > > Frequency > > # > > A 0 0.0000000 0.2500000 0.0000000 > > C 0 0.0000000 0.2500000 0.0000000 > > G 20 0.6451613 0.2500000 2.5806452 > > T 0 0.0000000 0.2500000 0.0000000 > > > > Other 11 0.3548387 0.0000000 > > 10000000000.0000000 > > > > > > > > > > Here is a similar sequence that works fine: > > > > > > >Seq2 > > VGSEGGGGGRRGEGGGGGGRGGGGGRWEEGAG > > > > > > > > # > > # Output from 'compseq' > > # > > # Only words in frame 1 will be counted. > > # The Expected frequencies are calculated on the (false) assumption that > > every > > # word has equal frequency. > > # > > # The input sequences are: > > # Seq2 > > > > > > Word size 1 > > Total count 31 > > > > # > > # Word Obs Count Obs Frequency Exp Frequency Obs/Exp > > Frequency > > # > > A 1 0.0322581 0.0476190 0.6774194 > > C 0 0.0000000 0.0476190 0.0000000 > > D 0 0.0000000 0.0476190 0.0000000 > > E 4 0.1290323 0.0476190 2.7096774 > > F 0 0.0000000 0.0476190 0.0000000 > > G 20 0.6451613 0.0476190 13.5483871 > > H 0 0.0000000 0.0476190 0.0000000 > > I 0 0.0000000 0.0476190 0.0000000 > > K 0 0.0000000 0.0476190 0.0000000 > > L 0 0.0000000 0.0476190 0.0000000 > > M 0 0.0000000 0.0476190 0.0000000 > > N 0 0.0000000 0.0476190 0.0000000 > > P 0 0.0000000 0.0476190 0.0000000 > > Q 0 0.0000000 0.0476190 0.0000000 > > R 4 0.1290323 0.0476190 2.7096774 > > S 1 0.0322581 0.0476190 0.6774194 > > T 0 0.0000000 0.0476190 0.0000000 > > U 0 0.0000000 0.0476190 0.0000000 > > V 0 0.0000000 0.0476190 0.0000000 > > W 1 0.0322581 0.0476190 0.6774194 > > Y 0 0.0000000 0.0476190 0.0000000 > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > EMBOSS mailing list > > [email protected] > > http://lists.open-bio.org/mailman/listinfo/emboss > > > _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
