Hello Janet, It is a rare pleasure to have the opportunity to enlighten somebody from the Fred Hutchinson Cancer Research Center about R functionality.
The bottom line is this: GenomicRanges is much more biology-awared than the generic RangedData class. GenomicRanges natively stores a strand value per feature. RangedData does not, unless you create it. GenomicRanges' strand values are very intuitive: +, -, and *. GenomicRanges "rows" can be ordered by any "column" even if it ends up dis-ordering the chromosomes. RangedData can only order features within each space. GenomicRanges can store the complete list of chromosomes and their corresponding sizes four your particular organism. You can create a GenomicRanges instance out of a RangedData without providing explicitly the list of chromosomes and their sizes. Just do library(GenomicRanges) my_gr <- as(my_rd,"GRanges") The list of chromosomes is gathered on the fly from the features. The list chromosome lengths still has to be assigned manually, which is fine. Nowadays you can rtracklayer::import() BED directly as GenomicRanges. Importing large BED into either GenomicRanges or RangedData is, in my experience, equally slow. There is no difference there. Why not forgetting RangedData then? The advantage over GenomicRanges is, also in my experience, that it accepts features mapped beyond the limits of chromosomes. The most unforgiving example is mitochondrial DNA. Because it is circular, it naturally gets sequencing reads with "starts" that are numerically larger than it "ends". In high throughput sequencing I still use RangedData when 1) I do not care about relatively few misbehaving reads 2) I need my script to run without errors from GenomicRanges sanity check. For everyday high throughput sequencing I use GenomicRanges keeping the chromosome lengths unassigned. It could be called a hybrid. I hope this helps. Ivan Ivan Gregoretti, PhD National Institute of Diabetes and Digestive and Kidney Diseases National Institutes of Health 5 Memorial Dr, Building 5, Room 205. Bethesda, MD 20892. USA. Phone: 1-301-496-1016 and 1-301-496-1592 Fax: 1-301-496-9878 On Thu, Oct 28, 2010 at 9:25 PM, Janet Young <[email protected]> wrote: > Hi, > > I've been on a long long vacation, so I'm a bit more out of the loop than I > usually am. > > I've been using RangedData a lot in my code until now to represent sets of > genomic regions spread over multiple chromosomes, and I've just realized > that GenomicRanges has a lot of the same characteristics. > > I wanted to ask you all > - whether RangedData and GenomicRanges are pretty much equivalent, or if > there are functions that exist for one but not the other? > - whether I can use pretty much the same code and functions if I switch > everything over to use GenomicRanges? > - are there subtle differences I should be careful of if I make the switch? > > thanks very much, > > Janet Young > > > ------------------------------------------------------------------- > > Dr. Janet Young (Trask lab) > > Fred Hutchinson Cancer Research Center > 1100 Fairview Avenue N., C3-168, > P.O. Box 19024, Seattle, WA 98109-1024, USA. > > tel: (206) 667 1471 fax: (206) 667 6524 > email: jayoung ...at... fhcrc.org > > http://www.fhcrc.org/labs/trask/ > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
