hello, I'm wondering if people have any interest in including in EMBOSS an application to calculate the relative abundance/bias of words.
The measure I have in mind is that used by Karlin and others (for example in Burge, C. et al. PNAS 1992). It is the frequency of a particular word, divided by its expected frequency based on the frequencies of all its subwords, including gapped subwords. This gives you bias at a particular word size, removing the effects at smaller word sizes. For small word sizes there are formulas which one can use, but as you get to larger sizes these get unwieldy. I've been working on some code which is able calculate this measure up to 10 or 11 bp words in reasonable amounts of time. If there is interest, I would be happy to contribute it. Eliot _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
