I want to make an intersection between a few hundreds of genomic intervals
(predicted translocation sites from SVDetect) and low mappability regions in
genomes (we are working with mm9 right now).
UCSC has an excellent mappability track that exactly matches our sequencing
data (50 bp kmers), but it seems very difficult to get that data into Galaxy. I
want a BED format that summarizes intervals of low mappability (ie. less than
0.5 on the scale used by UCSC). The UCSC Table Browser has a limit of 10M
lines, which seems to give just part of chromosome 1. It will be very messy to
try to get the whole genome bit by bit using this method and then stitch it
back together using some sort of concatenation.
UCSC Help suggests downloading the mappability data for the whole genome as a
bigwig formatted file, then convert to BED. I gave this a try, but we get a 4
GB file, with intervals of just one or two base pairs. Again, lots of work to
get back to the nicer BED that I could make with the UCSC tools over smaller
genomic regions. Also, super-painful to upload this huge file to Galaxy, and
unhappy trying to write my own parsers to filter and smooth this file.
Any other suggestions? Maybe someone else knows where to find a mappability
file (for mm9) that has nice intervals in a Galaxy compatible format.
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
To manage your subscriptions to this and other Galaxy lists,
please use the interface at: