Re: [Artemis-users] What's the equation used to calculate "Informational Entropy"?

Derek Gatherer Tue, 28 Feb 2006 00:39:19 -0800

Hi

Apologies for the delay in replying.


The informational entropy algorithm is quite simple.

For the set of 64 possible triplets, within your sliding window ofsize n, get the number of each. The frequencies are therefore:


pXXX = fXXX/(n/3)

n/3 is of course the number of possible triplets in the window.

So now you have a table with 64 rows, something like:

AAA  0.03
AAC  0.01
AAG  0.02

and so on.  Each of these is pXXX.  Then calculate:

eXXX = -pXXX log(2) XXX

then sum to sigma(eXXX). This is the Shannon entropy of the sequencein that frame in that window. Now slide the window and plot how thevalue changes.


In the Java, it works as follows:

ent -= freq*Math.log(freq)/Math.log(2);  //  H = sigma(p* log(2) p)

as log in Java is base 10, so you get a log base 2 by dividing it bylog 10 of 2.

Entropy is a measure of the disorder of the sequence. Codingsequences, and repeat sequences score lower than random sequence(which in this case will score 6. "Typical" scores for chromosomalsequences - I'm looking at the herpes simplex 1 genome - are in theregion 4.5 to 5.8 depending on the window size). So it is a featuredetector of sorts. It is partly a gene-detector, although you are indanger of confusing non-coding repeats with coding sequences, so ifused as a gene-detector, always back it up with anothergene-detecting algorithm.


cheers
Derek


_______________________________________________
Artemis-users mailing list
[email protected]
http://lists.sanger.ac.uk/mailman/listinfo/artemis-users

Re: [Artemis-users] What's the equation used to calculate "Informational Entropy"?

Reply via email to