May I advocate for 'IndexedDataFrame' or 'IndexedFrame'? 'rowIndices' can return whatever makes sense (GRanges, or other data structures -thinking taxonomy for metagenomics for example-). GRangesFrame can inherit from this.
On Wed, Mar 4, 2015 at 3:28 AM, Hervé Pagès <hpa...@fredhutch.org> wrote: > GRangesFrame is an interesting idea and I gave it some thoughts. > > There is this nice symmetry between GRanges and GRangesFrame: > > - GRanges = a naked GRanges + a DataFrame accessible via mcols() > > - GRangesFrame = a DataFrame + a naked GRanges accessible via > some accessor (e.g. rowRanges()) > > So GRanges and GRangesFrame are equivalent in terms of what they > can hold, but different in terms of API: the former has the ranges > API as primary API and the DataFrame API on its mcols() component, > and the latter has the DataFrame API as primary API and the ranges > API on its rowRanges() component. Nice switch! > > What does this API switch bring us? A GRangesFrame object is now > an object that fully behaves like a DataFrame and people can also > perform range-based operations on its rowRanges() component. > Here is what I'm afraid is going to happen: people will also want > to be able to perform range-based operations *directly* on > these objects, i.e. without having to call rowRanges() first. > So for example when they do subsetByOverlaps(), subsetting > happens vertically. Also the Hits object returned by findOverlaps() > would contain row indices. Problem with this is that these objects > now start to suffer from the "dual personality syndrome". For > example, it's not clear anymore what their length should be. > Strictly speaking it should be their number of columns (that's > what the length of a DataFrame is), but the ranges API that > we're trying to put on them also makes them feel like vectors > along the vertical dimension so it also feels that their length > should be their number of rows. Same thing with 1D subsetting. > Why does it subset the columns and not the rows? Most people > are now confused. > > It's interesting to note that the same thing happens with GRanges > objects, but in the opposite direction: people wish they could > do DataFrame operations directly on them without calling mcols() > first. But in order to preserve the good health of GRanges objects, > we've not done that (except for $, a shortcut for mcols(x)$, > the pressure was just too strong). > > H. > > > > On 03/03/2015 04:35 PM, Michael Lawrence wrote: > >> Should be possible for the annotations to be of any type, as long as they >> satisfy a simple contract of NROW() and 2D "[". Then, you could have a >> DataFrame, GRanges, or whatever in there. But it would be nice to have a >> special class for the container with range information. The contract for >> the range annotation would be to have a granges() method. >> >> I agree it would be nice if there was a way with the methods package to >> easily assert such contracts. For example, one could define an interface >> with a set of generics (and optionally the relevant position in the >> generic >> signature). Then, once all of the methods have been assigned for a >> particular class, it is made to inherit from that contract class. There >> are >> lots of gotchas though. Not sure how useful it would be in practice. >> >> >> On Tue, Mar 3, 2015 at 4:07 PM, Peter Haverty <haverty.pe...@gene.com> >> wrote: >> >> There are some nice similarities in these new imaginary types. A >>> "GRangesFrame" is a list of dimensionally identical things (columns) and >>> some row meta-data (the GRanges). The SE-like object is similarly a list >>> of dimensionally like things (matrices, RleDataFrames, BigMatrix objects, >>> HDF5-backed things) with some row meta-data (a DataFrame or >>> GRangesFrame). >>> Elegant? Maybe they would actually be relatives in the class tree. >>> >>> I wonder if this kind of thing would be easier if we had Java-style >>> Interfaces or duck-typing. The "x" slot of "y" holds something that >>> implements this set of methods ... >>> >>> Oh, and kinda apropos, the genoset class will probably go away or become >>> an extension to this new SE-like thing. The extra stuff that comes along >>> with genoset will still be available. >>> >>> Pete >>> >>> ____________________ >>> Peter M. Haverty, Ph.D. >>> Genentech, Inc. >>> phave...@gene.com >>> >>> On Tue, Mar 3, 2015 at 3:42 PM, Tim Triche, Jr. <tim.tri...@gmail.com> >>> wrote: >>> >>> This. >>>> >>>> It would be damned near perfect as a return value for assays coming out >>>> of >>>> an object that held several such assays at several time points in a >>>> population, where there are both assay-wise and covariate-wise "holes" >>>> that >>>> could nonetheless be usefully imputed across assays. >>>> >>>> >>>> Statistics is the grammar of science. >>>> Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> >>>> >>>> On Tue, Mar 3, 2015 at 3:25 PM, Peter Haverty <haverty.pe...@gene.com> >>>> wrote: >>>> >>>> >>>>>> >>>>>> >>>>>> I still think GRanges should be a subclass of DataFrame, >>>>>> >>>>>>> which would make this easy, but I don't seem to be winning that >>>>>>> >>>>>> argument. >>>>> >>>>>> >>>>>>> >>>>>> Just impossible. As Michael mentioned back in November, they have >>>>>> conflicting APIs. >>>>>> >>>>> >>>>> >>>>> Maybe a new "GRangesFrame" that is a DataFrame and holds a GRanges >>>>> (without mcols) as an index? >>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioc-devel@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>> >>>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioc-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>> >>>> >>> >>> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fredhutch.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel