Thanks Kasper for pointing out the Mapability tracks on UCSC.

Fishing around a bit more, I found that the Bowtie distribution comes with a "mapability.pl" script. It makes use of an undocumented (?) -F flag that fragments the input sequences given a window and step sizes (-F win,step).

For those using Bowtie, this would allow using the same alignment strategy for mappable length and actual mapping, as Kasper suggests.

Cheers,

Cei

Kasper Daniel Hansen wrote:
On Wed, Jul 7, 2010 at 9:36 AM, Cei Abreu-Goodger <[email protected]> wrote:
After short-read alignment, one post-processing step might be to normalize
by the length (e.g. of an individual exon, of all genes, etc). This should
actually be the mappable length of these portions of the genome, not the
real length. Mappable length could be defined as the number of distinct
k-mers that uniquely align in a given portion of the genome.

Of course you want to use mappable length, as we did a long time ago:

http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1000299

You can also find the Broad "mappability" track on UCSC.

In my opinion, mappable length ought to be computed under the same
alignment strategy and aligner as you are using to map the reads.  Of
course, one could claim that mappability under one strategy ought to
be pretty similar to mappability under another strategy, but I have
never seen any real investigation into these claims.

It is pretty easy to compute for small genomes, and it is computable
for larger genomes, although it does involve a lot of scripting and
postproccesing.  You can cut down your time if your're only interested
in say mappability for all ensembl genes (which is about 100x faster
than mappability for the entire human genome).

I have always used custom scripts for this.

Kasper

In a previous thread, Simon Andrews mentioned a Bowtie perl wrapper:

https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2009-May/000315.html

I seem to recall another post suggesting using the BSgenome packages for a
similar purpose...

Perhaps I'm missing something obvious and this functionality is already
included in one of the many sequencing-related packages out there.

Any thoughts?

Cheers,

Cei

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to