Dear UCSC genome browser team,

I am working with a large bedGraph file (800 Mb), which I would like to 
convert to bigWig format for display in a local browser. The data is 
from a non-model organism with a large number of supercontigs as 
chromosomes (>40.000). Unfortunately, when I try to extract data for 
different chromosomes, I only get results for the first two chromsomes 
(v31.000000, v31.000001) but nothing for any of the other chromosomes 
(see below).

I have performed the following steps in a ubuntu Linux environment, 
using binary files from the UCSC genome browser server 
<http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/> that carry the 
date 14-Dec-2010.

#1.) I start with a bedgraph file
> head reads.wig
>
> v31.000000      0       96      0
> v31.000000      96      132     2
> v31.000000      132     358     0
> v31.000000      358     394     1
> v31.000000      394     1176    0
> v31.000000      1176    1212    1
> v31.000000      1212    1350    0
> v31.000000      1350    1359    1
> v31.000000      1359    1386    2
> v31.000000      1386    1395    1
# the chromosomes / supercontigs are numbered, e.g. v31.000000, 
v31.000001, etc.
# there is data for other chromosomes / supercontigs, too:
> grep v31.000004 reads.wig |head
>
> v31.000004      0       566     0
> v31.000004      566     602     1
> v31.000004      602     871     0
> v31.000004      871     907     3
> v31.000004      907     1185    0
> v31.000004      1185    1191    1
> v31.000004      1191    1197    2
> v31.000004      1197    1221    3
> v31.000004      1221    1227    2
> v31.000004      1227    1233    1

#2.) I have a chrom.size file
> head chrom.sizes
> v31.000000      672495
> v31.000001      531507
> v31.000002      537584
> v31.000003      495203
> v31.000004      468990
> v31.000005      441662
> v31.000006      373162
> v31.000007      368850
> v31.000008      365163
> v31.000009      408020

#3.) I convert the bedgraph file to bigwig format
> bedGraphToBigWig reads.wig chrom.sizes reads.bw
# and check the resulting bigwig file (153 Mb)
> bigWigInfo reads.bw
>
> version: 3
> isCompressed: yes
> isSwapped: 0
> primaryDataSize: 135,714,085
> primaryIndexSize: 2,046,848
> zoomLevels: 7
> chromCount: 42227
> basesCovered: 865,781,703
> mean: 9.542902
> min: 0.000000
> max: 65535.000000
> std: 326.732483


# as expected, there are >40.000 chromosomes in the bigwig file

#4.) Now, here's where the trouble starts:
# when I try to extract data for different chromosomes, I only get 
results for the first two chromsomes (v31.000000, v31.000001) but 
nothing for any of the other chromosomes I have tried (e.g. v31.000004).
e.g,

> bigWigSummary reads.bw v31.000000 1 100000 1
> 5.60354
>
> bigWigSummary reads.bw v31.000001 1 100000 1
> 15.3698
>
> bigWigSummary reads.bw v31.000002 1 100000 1
> no data in region v31.000002:1-100000 in reads.bw
>
> bigWigSummary reads.bw v31.000003 1 100000 1
> no data in region v31.000003:1-100000 in reads.bw
>
> bigWigSummary reads.bw v31.000004 1 100000 1
> no data in region v31.000004:1-100000 in reads.bw

Weirdly, when I create a bigWig file from a bedgraph file that only 
contains data for the first 5 supercontigs (v31.000001 - v31.000005), 
bigWigToWig or bigWigSummary returns the expected data just fine.
> egrep -i "v31.000000|v31.000001|v31.000002|v31.000003|v31.000004" 
> reads.wig > subset.wig
> bedGraphToBigWig subset.wig chrom.sizes subset.bw
Again, the resulting bigWig file seems to be fine:
> bigWigInfo subset.bw
> version: 3
> isCompressed: yes
> isSwapped: 0
> primaryDataSize: 554,177
> primaryIndexSize: 7,268
> zoomLevels: 7
> chromCount: 5
> basesCovered: 2,701,742
> mean: 12.688771
> min: 0.000000
> max: 36270.000000
> std: 290.058841
This time, I can extract data for the other chromosomes, too:
> bigWigSummary subset.bw v31.000000 1 100000 1
> 5.60354
> bigWigSummary subset.bw v31.000001 1 100000 1
> 15.3698
> bigWigSummary subset.bw v31.000002 1 100000 1
> 3.62483
> bigWigSummary subset.bw v31.000003 1 100000 1
> 0.686673
> bigWigSummary subset.bw v31.000004 1 100000 1
> 3.0281
I have tried generating compressed or uncompressed bigWig files, but the 
result is the same.

Am I doing something wrong ? Are my chromosome identifiers incompatible 
with bigWig format ? Might this be a bug somewhere ?

Any advice is greatly appreciated.

Thanks a lot,
Thomas
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to