Re: Pattern matching problem

wolf blaum Thu, 26 Feb 2004 22:15:58 -0800

On Thursday 26 February 2004 12:28, Henry Todd generously enriched virtual 
reality by making up this one:


> On 2004-02-26 00:43:21 +0000, [EMAIL PROTECTED] (Wolf Blaum) said:
> > As I understand Biology, there is 4 nucleotid acids which gives 4**2
> > combinaions for dupplets. So you need 8 vars to count the occourence of
> > all douplets. Worse for triplets. (24)
> > As I understand genetics, triplets are what matters, since the rma
> > transcriptase reads triplets as code of amino acids. You might give my
> > updates un my biol. knowledge:-)
>
> Wolf -
>
> It's been a while since my A-Level biology days, but I believe you're
> correct. However, this particular coursework was to create two programs
> for a different purpose than I think you're imagining:

Hi, 

as  you can tell form my mail it has been a while since my basic math classes, 
too: 4**2 =8? 4**3=24?  Uhuh...
However, the real bug was 
for (my $i=0;$i < length($sequence) - $wordsize;$i++){
which should be 
for (my $i=0;$i <= length($sequence) - $wordsize;$i++){
beause it misses the last douplet/triplet/... otherwise.

> transition.pl: returns tables of transition probabilities for plus and
> minus models (exon and non-exon regions) as well as beta values
> (log-odds ratios) to compare the two models.
>
> The transition probability for AT for example (the probability that
> adenine will be followed by thymine) is calculated thus:
>
> tp(AT) = |AT| / |A_|
>
> The total number of occurrences of "AT" divided by the total number of
> "A" followed by anything.
>
> The program can also write the transition probabilities to a file to be
> used as input for the other program...

ok - but once you end up with a hash containing all the douplets as there keys 
and frequency as values that should be doable as long as you know the members 
of your alphabet. 
I dont know if there is such a thing as transition probabilitis for codons (ie 
triplets) as well - if there is, then this should manifest as transition 
probilities for amino accids. In that case, creating the hash of wmers is 
done by just feeding the script another sequence. The only thing to change 
would be add knowledge about the AA alphabet to your script.

> simulation.pl: which asks the user to specify the length of the
> sequence they want, then generates it according to the model file used
> as input (by simulating a Markov chain). So if you supply a file
> containing the transition probabilities of a typical exon (coding)
> region, the simulation will use them to generate a typical exon
> sequence.

This gets really of topic:
Just interested: How do you choose which Letter to start with since there is 
no tp for nothing folowed by whatever?

Sounds like a fun problem:)

G'day, Wolf




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Pattern matching problem

Reply via email to