On Wed, Jul 7, 2010 at 9:36 AM, Cei Abreu-Goodger <[email protected]> wrote: > After short-read alignment, one post-processing step might be to normalize > by the length (e.g. of an individual exon, of all genes, etc). This should > actually be the mappable length of these portions of the genome, not the > real length. Mappable length could be defined as the number of distinct > k-mers that uniquely align in a given portion of the genome.
Of course you want to use mappable length, as we did a long time ago: http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1000299 You can also find the Broad "mappability" track on UCSC. In my opinion, mappable length ought to be computed under the same alignment strategy and aligner as you are using to map the reads. Of course, one could claim that mappability under one strategy ought to be pretty similar to mappability under another strategy, but I have never seen any real investigation into these claims. It is pretty easy to compute for small genomes, and it is computable for larger genomes, although it does involve a lot of scripting and postproccesing. You can cut down your time if your're only interested in say mappability for all ensembl genes (which is about 100x faster than mappability for the entire human genome). I have always used custom scripts for this. Kasper > In a previous thread, Simon Andrews mentioned a Bowtie perl wrapper: > > https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2009-May/000315.html > > I seem to recall another post suggesting using the BSgenome packages for a > similar purpose... > > Perhaps I'm missing something obvious and this functionality is already > included in one of the many sequencing-related packages out there. > > Any thoughts? > > Cheers, > > Cei > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
