Michael,
Before people start developing their own bed file parsers, it looks pretty straight-forward to incorporate this operation in rtracklayer now. You or I could add a scoreRanges argument, which takes either a logical or perhaps a character string if we envision there being more than one way to score a set of ranges, to the import* functions and when a appropriate value is supplied, it will internally block process the data file and output a RangedData object with a score column.


Patrick


Michael Lawrence wrote:
On Thu, Sep 17, 2009 at 2:17 PM, Ivan Gregoretti <[email protected]> wrote:

Hi Michael,

True; there are no scores in this BED file. That brings us to Sean's
remark.

You may have noticed that all features in the BED file are 36 bases
long. Each feature is the position and orientation of a Solexa read.

I want the value in each bin in the WIG file to represent the number
of tags contained in that bin.

So, coverage() is close to what I am looking for but I think that
pileup() is perhaps better. Then comes the problem of how to place
those values in an object exportable as WIG. Lets not talk about
efficiency for now.


Ignoring efficiency, it's pretty straight-forward. In the development
version of IRanges, one can convert an Rle (or RleList for multiple
chromosomes) as output to coverage() to a RangedData. In the development
version of rtracklayer, this is done implicitly:

export(cov, "coverage.wig")


I think that I can solve this problem myself but given that:

1) there is a package that can read BED and write WIG files
and
2) BED to WIG conversion is such a common task

I thought that there must have been an existing tool or perhaps an
established efficient way of doing this with BioC tools. Why is this
important? Well, this myRegions.bed contains 4.3 million records.
Loading them with import() used about 1GB of RAM. My real world BED
files are never less than 50 million reads, 150 more like.

Conclusion: I DO care about an expert's opinion of BED-> WIG
conversion within R.


Maybe nobody does this with R. That is also acceptable. What tool do
you use then?


The bottleneck seems to be loading and representing large BED files, since
after the reduction to coverage the data size is usually manageable.
rtracklayer was never designed to load BED files with millions of records.
In my experience, short reads are more commonly represented in more
efficient formats like those output by MAQ and bowtie, which are imported
with ShortRead.

There is much room for optimization in rtracklayer, but for now one idea is
to calculate coverage on subsets of the reads, and aggregate the results.
The BED file can be loaded with simple low-level calls like read.table and
scan.

Michael



Thank you,

Ivan


Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878



On Thu, Sep 17, 2009 at 4:45 PM, Sean Davis <[email protected]> wrote:
On Thu, Sep 17, 2009 at 4:43 PM, Michael Lawrence <[email protected]>
wrote:
On Thu, Sep 17, 2009 at 8:14 AM, Ivan Gregoretti <[email protected]>
wrote:

Hello everybody

How do you convert BED formatted files to WIG files?


I tried to do that with rtracklayer but it didn't quite succeed.

This is the session's transcript:

First, you can download
http://dl.getdropbox.com/u/2051155/myRegions.bed, which looks like
this

chr1    3002444 3002479                 +
chr1    3002989 3003024                 -
chr1    3017603 3017638                 +
chr1    3017879 3017914                 -
chr1    3018173 3018208                 +
chr1    3018183 3018218                 -
chr1    3018183 3018218                 -
chr1    3019065 3019100                 +
chr1    3019761 3019796                 -
chr1    3020044 3020079                 -
...

Now to R

suppressMessages(library(rtracklayer))
myRegions <- import('myRegions.bed')

So far so good. Now I try:
export(myRegions, 'myRegions.wig', format = 'wig')

but I get:
Error in export.ucsc(object, con, subformat, ...) :
 Track not compatible with WIG format: Overlapping features must be
on separate strands and every feature width must be positive

I seems that the error message is a feature rather than a bug. My
interpretation is that export() does not like records like lines 6 and
7:
chr1    3018183 3018218                 -
chr1    3018183 3018218                 -

So, how do you convert BED to WIG in your everyday work?


Well, as rtracklayer said, this track is not representable by WIG, which
is
meant to communicate a single data value for a given genomic region (to
generate the bar plots in the UCSC browser, for instance). There aren't
any
scores in your file, so why do you want to use WIG?

Just guessing, here, but you may want to calculate coverage() first,
Ivan?
Sean

Thank you,

Ivan

sessionInfo()
R version 2.9.2 (2009-08-24)
x86_64-redhat-linux-gnu

locale:



LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] rtracklayer_1.4.0 RCurl_0.94-1

loaded via a namespace (and not attached):
[1] Biobase_2.4.0     Biostrings_2.12.0 BSgenome_1.12.0
IRanges_1.2.0
[5] tools_2.9.2       XML_2.3-0

Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

       [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to