Some guidance on how to avoid duplication of the matrix for developers would be greatly appreciated.
Another example of a trouble point, is that if I am given an SE with an unnamed assay and I need to give the assay a name, this also can expand the memory used. I had found a solution (which works with GenomicRanges 1.18 / current release) with: names(assays(se, withDimnames=FALSE))[1] <- "foo" But now I'm looking in devel and this appears to no longer work. The memory used expands, equivalent to: names(assays(se))[1] <- "foo" Here's some code to try this: m <- matrix(1:1e7,ncol=10,dimnames=list(1:1e6,1:10)) se <- SummarizedExperiment(m) names(assays(se, withDimnames=FALSE))[1] <- "foo" names(assays(se))[1] <- "foo" while running gc() in between steps. On Mon, Mar 9, 2015 at 10:36 AM, Kasper Daniel Hansen <kasperdanielhan...@gmail.com> wrote: > On Mon, Mar 9, 2015 at 10:30 AM, Vincent Carey <st...@channing.harvard.edu> > wrote: > >> I am glad you are keeping this discussion alive Kasper. >> >> On Mon, Mar 9, 2015 at 10:06 AM, Kasper Daniel Hansen < >> kasperdanielhan...@gmail.com> wrote: >> >>> It sounds like the proposed changes are already made. However (like >>> others) I am still a bit mystified why this was necessary. The old >>> version >>> did allow for a GRanges inside the DataFrame of the rowData, as far as I >>> recall. So I assume this is for efficiency. But why? What kind of >>> data/use cases is this for? >>> >>> I am happy to hear that SummarizedExperiment is going to be spun out into >>> its own package. When that happens, I have some comments, which I'll >>> include here in anticipation >>> 1) I now very strongly believe it was a design mistake to not have >>> colnames on the assays. The advantage of this choice is that sampleNames >>> are only stored one place. The extreme disadvantage is the high >>> ineffeciency when you want colnames on an extracted assay. >>> >> >> after example(SummarizedExperiment) >> >> > colnames(assays(se1)[[1]]) >> [1] "A" "B" "C" "D" "E" "F" >> >> so this seems to be optional. But attempts to set rownames will fail >> silently >> >> > rownames(assays(se1)[[1]]) = as.character(1:200) >> >> > rownames(assays(se1)[[1]]) >> >> NULL >> seems we could issue a warning there >> > > > Vince, you need to be careful here. > > The assays are stored without colnames (unless something has recently > changed). The default is to - upon extraction - set the colnames of the > matrix. This however requires a copy of the entire matrix. So > essentially, upon extraction, each assay is needlessly duplicated to add > the colnames. This is what I mean by inefficient. I would prefer to store > the assays with colnames. This means that changing sampleNames of the > object will be inefficient (as it is for eSets) since it would require a > complete copy of everything. But I would rather - much rather - copy when > setting sampleNames than copy when extracting an assay. > > Best, > Kasper > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel