Yes, I am generally in favor of the GViews, and I've been a fan of Views since they were invented. Was just pointing out that we can do a lot with Lists these days.
On Tue, Jun 28, 2016 at 4:56 PM, Hervé Pagès <hpa...@fredhutch.org> wrote: > On 06/28/2016 05:43 AM, Michael Lawrence wrote: >> >> Thanks for these use cases. >> >> I've been wondering about the usefulness of Views given how far along >> Lists have come since the invention of Views. Instead of >> viewSums(Views(cov, gr)) we could just do sum(cov[gr]). The latter >> works today. The difference between the Views and List approach is >> that the Views data structure defers the extraction until >> summarization. > > > I think this is an important feature. > >> So we can always retrieve the entire underlying vector, >> and the ranges of interest. But for summarize-and-forget use cases, >> Lists should work fine. > > > They work, but they're not super convenient. The user needs to know > how to import data for his/her regions of interest only. The way to > do this can vary across the type of file (e.g. 2bit vs BigWig). > Having GViews objects hides these details and provides a unified > mode of operating on a set of genomic regions of interest. > > Also, unlike a List, a GViews object bundles together the genomic > regions and the data associated with them. This makes the object > self-descriptive and thus reduces the risk of errors. > >> >> I like the idea of pushing the aggregation down to the data. BigWig >> files are particularly well suited for this, because they have >> precomputed summary statistics. See summary,BigWigFile. It would take >> some hacking of the Kent library to expose everything we want, like >> the sums. > > > This sounds like an argument in favor of using GViews objects to me. > > Thanks, > H. > > >> >> Michael >> >> On Tue, Jun 28, 2016 at 3:54 AM, Johnston, Jeffrey <j...@stowers.org> >> wrote: >>> >>> During the BioC developer day, Hervé brought up the idea of possibly >>> extending the concept of GenomicViews to other use cases. I'd like to share >>> how we currently use GenomicRanges, Views and RleLists to perform certain >>> analyses, and hopefully demonstrate that a way to directly use Views with >>> GRanges would be quite useful. >>> >>> As an example, let's say we have 2 ChIP-seq samples (as BigWig files) and >>> a set of genomic ranges we are interested in (as a BED file). We want to >>> find the sum of the ChIP-seq signal found in our regions of interest for >>> each of the two samples. We'd first import the BED file as a GRanges object. >>> Annotating the GRanges with two metadata columns representing the ChIP-seq >>> signal for the two samples seems like a straightforward use for Views (in >>> particular, viewSums), but it is a bit convoluted. >>> >>> The main problem is that Views work with RangesLists, not GRanges. >>> Coercing a GRanges to a RangesList possibly disrupts the order, so we have >>> to first reorder the GRanges to match the order it will be given after >>> coercion (keeping track so we can later revert back to the original order): >>> >>> regions <- import("regions_of_interest.bed") >>> sample1_cov <- import("sample1.bw", as="RleList") >>> sample2_cov <- import("sample2.bw", as="RleList") >>> oo <- order(regions) >>> regions <- regions[oo] >>> >>> Now we can construct a View and use viewSums to obtain our values >>> (unlisting them as they are grouped by seqnames) and add them as a metadata >>> column in our GRanges, restoring the original order of the GRanges when we >>> are done: >>> >>> v <- Views(sample1_cov, as(regions, "RangesList")) >>> mcols(regions)$sample1_signal <- unlist(viewSums(v)) >>> regions[oo] <- regions >>> >>> We then repeat the process to add another metadata column for sample2. >>> >>> To me, having the ability to use a GRanges as a view into an RleList >>> makes a lot more sense. That would allow us to reduce all the above >>> complexity to something like: >>> >>> regions$sample1_signal <- viewSums(Views(sample1_cov, regions)) >>> regions$sample2_signal <- viewSums(Views(sample2_cov, regions)) >>> >>> That alone would be great! But, there's a way to make it even better. >>> Storing these RleLists in memory for each of our samples is quite >>> inefficient, especially since our regions of interest are only a small >>> portion of them. The rtracklayer package already has some very useful >>> functionality for instantiating an RleList with only the data from specific >>> ranges of a BigWig file. Taking advantage of that, we can dramatically >>> reduce our memory usage and increase our performance like so: >>> >>> regions <- import("regions_of_interest.bed") >>> sample1_cov <- import("sample1.bw", as="RleList", which=regions) >>> sample2_cov <- import("sample2.bw", as="RleList", which=regions) >>> regions$sample1_signal <- viewSums(Views(sample1_cov, regions)) >>> regions$sample2_signal <- viewSums(Views(sample2_cov, regions)) >>> >>> But can't this functionality be included in Views? Why not have it accept >>> a BigWig file in place of an RleList and have it selectively load the >>> portions of the BigWig it needs based on the provided GRanges? That would >>> allow this: >>> >>> regions <- import("regions_of_interest.bed") >>> regions$sample1_signal <- viewSums(Views("sample1.bw", regions)) >>> regions$sample2_signal <- viewSums(Views("sample2.bw", regions)) >>> >>> The above is quite close to how I use GRanges and BigWigs now, except I >>> have to write and maintain all the hackish code to link BigWig files, >>> GRanges, Views, RangesLists and RleLists together into something that >>> behaves as one would intuitively expect. >>> >>> I’d welcome any thoughts on how people perform similar analyses that >>> involve GRanges and data stored in BigWig files or RleLists, and whether >>> this would also be useful to them. >>> >>> Thanks, >>> Jeff Johnston >>> Zeitlinger Lab >>> Stowers Institute >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fredhutch.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel