Dear UCSC genome browser team, I am working with a large bedGraph file (800 Mb), which I would like to convert to bigWig format for display in a local browser. The data is from a non-model organism with a large number of supercontigs as chromosomes (>40.000). Unfortunately, when I try to extract data for different chromosomes, I only get results for the first two chromsomes (v31.000000, v31.000001) but nothing for any of the other chromosomes (see below).
I have performed the following steps in a ubuntu Linux environment, using binary files from the UCSC genome browser server <http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/> that carry the date 14-Dec-2010. #1.) I start with a bedgraph file > head reads.wig > > v31.000000 0 96 0 > v31.000000 96 132 2 > v31.000000 132 358 0 > v31.000000 358 394 1 > v31.000000 394 1176 0 > v31.000000 1176 1212 1 > v31.000000 1212 1350 0 > v31.000000 1350 1359 1 > v31.000000 1359 1386 2 > v31.000000 1386 1395 1 # the chromosomes / supercontigs are numbered, e.g. v31.000000, v31.000001, etc. # there is data for other chromosomes / supercontigs, too: > grep v31.000004 reads.wig |head > > v31.000004 0 566 0 > v31.000004 566 602 1 > v31.000004 602 871 0 > v31.000004 871 907 3 > v31.000004 907 1185 0 > v31.000004 1185 1191 1 > v31.000004 1191 1197 2 > v31.000004 1197 1221 3 > v31.000004 1221 1227 2 > v31.000004 1227 1233 1 #2.) I have a chrom.size file > head chrom.sizes > v31.000000 672495 > v31.000001 531507 > v31.000002 537584 > v31.000003 495203 > v31.000004 468990 > v31.000005 441662 > v31.000006 373162 > v31.000007 368850 > v31.000008 365163 > v31.000009 408020 #3.) I convert the bedgraph file to bigwig format > bedGraphToBigWig reads.wig chrom.sizes reads.bw # and check the resulting bigwig file (153 Mb) > bigWigInfo reads.bw > > version: 3 > isCompressed: yes > isSwapped: 0 > primaryDataSize: 135,714,085 > primaryIndexSize: 2,046,848 > zoomLevels: 7 > chromCount: 42227 > basesCovered: 865,781,703 > mean: 9.542902 > min: 0.000000 > max: 65535.000000 > std: 326.732483 # as expected, there are >40.000 chromosomes in the bigwig file #4.) Now, here's where the trouble starts: # when I try to extract data for different chromosomes, I only get results for the first two chromsomes (v31.000000, v31.000001) but nothing for any of the other chromosomes I have tried (e.g. v31.000004). e.g, > bigWigSummary reads.bw v31.000000 1 100000 1 > 5.60354 > > bigWigSummary reads.bw v31.000001 1 100000 1 > 15.3698 > > bigWigSummary reads.bw v31.000002 1 100000 1 > no data in region v31.000002:1-100000 in reads.bw > > bigWigSummary reads.bw v31.000003 1 100000 1 > no data in region v31.000003:1-100000 in reads.bw > > bigWigSummary reads.bw v31.000004 1 100000 1 > no data in region v31.000004:1-100000 in reads.bw Weirdly, when I create a bigWig file from a bedgraph file that only contains data for the first 5 supercontigs (v31.000001 - v31.000005), bigWigToWig or bigWigSummary returns the expected data just fine. > egrep -i "v31.000000|v31.000001|v31.000002|v31.000003|v31.000004" > reads.wig > subset.wig > bedGraphToBigWig subset.wig chrom.sizes subset.bw Again, the resulting bigWig file seems to be fine: > bigWigInfo subset.bw > version: 3 > isCompressed: yes > isSwapped: 0 > primaryDataSize: 554,177 > primaryIndexSize: 7,268 > zoomLevels: 7 > chromCount: 5 > basesCovered: 2,701,742 > mean: 12.688771 > min: 0.000000 > max: 36270.000000 > std: 290.058841 This time, I can extract data for the other chromosomes, too: > bigWigSummary subset.bw v31.000000 1 100000 1 > 5.60354 > bigWigSummary subset.bw v31.000001 1 100000 1 > 15.3698 > bigWigSummary subset.bw v31.000002 1 100000 1 > 3.62483 > bigWigSummary subset.bw v31.000003 1 100000 1 > 0.686673 > bigWigSummary subset.bw v31.000004 1 100000 1 > 3.0281 I have tried generating compressed or uncompressed bigWig files, but the result is the same. Am I doing something wrong ? Are my chromosome identifiers incompatible with bigWig format ? Might this be a bug somewhere ? Any advice is greatly appreciated. Thanks a lot, Thomas _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
