Yes, you're right! Sorry for the noise. I forgot this was how it always behaved. All I had to do was change the argument name.
On Wed, Apr 1, 2015 at 3:51 PM, Hervé Pagès <hpa...@fredhutch.org> wrote: > Hi Michael, > > On 04/01/2015 07:17 AM, Michael Love wrote: >> >> I'll retract those last two emails about empty GRanges. That's simply: >> >> se <- SummarizedExperiment(assays, colData=colData) >> mcols(se) <- myDataFrame > > > Glad you found a simple way to do what you wanted. > > More below... > >> >> On Tue, Mar 31, 2015 at 4:40 PM, Michael Love >> <michaelisaiahl...@gmail.com> wrote: >>> >>> Would this code inspired by the release version of GenomicRanges work? >>> e.g. if I want to add a DataFrame with 10 rows: >>> >>> names <- letters[1:10] >>> x <- relist(GRanges(), PartitioningByEnd(integer(10), names=names)) >>> mcols(x) <- DataFrame(foo=1:10) >>> >>> Then give x to the rowRanges argument of SummarizedExperiment? >>> >>> On Tue, Mar 31, 2015 at 3:49 PM, Michael Love >>> <michaelisaiahl...@gmail.com> wrote: >>>> >>>> I forgot to ask my other question. I had gone in early March and fixed >>>> my code to eliminate rowData<-, but the argument to SummarizedExperiment >>>> was still called rowData, and a DataFrame could be provided. Then I >>>> didn't check for a few weeks, but the argument for the rowData slot is >>>> now called rowRanges. What's the trick to putting a DataFrame on an >>>> empty GRanges, so I can get the old behavior but now using the rowRanges >>>> argument? > > > I'm not sure what you meant by "so I can get the old behavior but > now using the rowRanges argument". > > Just to clarify: the renaming of rowData to rowRanges is a change > of name only, not a change of behavior. More precisely the new > rowRanges() accessor should behave exactly as the old rowData() > accessor. The same applies to the 'rowRanges' argument of the > SummarizedExperiment() constructor. So whatever you were passing > before to the 'rowData' argument, you should still be able to pass > it to the new 'rowRanges' argument. Please let us know if it's not > the case as this is certainly not intended. > > Thanks, > H. > > >>>> >>>> On Tue, Mar 31, 2015 at 3:40 PM, Michael Love >>>> <michaelisaiahl...@gmail.com> wrote: >>>>> >>>>> With GenomicRanges 1.19.48, I'm still having issues with re-naming the >>>>> first assay and duplication of memory from my March 9 email. I tried >>>>> assayNames<- as well. My use case is if I am given a >>>>> SummarizedExperiment where the first element is not named "counts" >>>>> (albeit the SE is most likely coming from summarizeOverlaps() and >>>>> already named "counts"...). >>>>> >>>>>> sessionInfo() >>>>> >>>>> R Under development (unstable) (2015-03-31 r68129) >>>>> Platform: x86_64-apple-darwin12.5.0 (64-bit) >>>>> Running under: OS X 10.8.5 (Mountain Lion) >>>>> >>>>> locale: >>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >>>>> >>>>> attached base packages: >>>>> [1] stats4 parallel stats graphics grDevices datasets utils >>>>> methods base >>>>> >>>>> other attached packages: >>>>> [1] GenomicRanges_1.19.48 GenomeInfoDb_1.3.16 IRanges_2.1.43 >>>>> S4Vectors_0.5.22 >>>>> [5] BiocGenerics_0.13.10 testthat_0.9.1 devtools_1.7.0 >>>>> knitr_1.9 >>>>> [9] BiocInstaller_1.17.6 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] formatR_1.1 XVector_0.7.4 tools_3.3.0 stringr_0.6.2 >>>>> evaluate_0.5.5 >>>>> >>>>> On Mon, Mar 9, 2015 at 1:21 PM, Michael Love >>>>> <michaelisaiahl...@gmail.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Mar 9, 2015 12:36 PM, "Martin Morgan" <mtmor...@fredhutch.org> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> On 03/09/2015 08:07 AM, Michael Love wrote: >>>>>>>> >>>>>>>> >>>>>>>> Some guidance on how to avoid duplication of the matrix for >>>>>>>> developers >>>>>>>> would be greatly appreciated. >>>>>>> >>>>>>> >>>>>>> >>>>>>> It's unsatisfactory, but using withDimnames=FALSE avoids duplication >>>>>>> on extraction of assays (but obviously you don't have dimnames on the >>>>>>> matrix). Row or column subsetting necessarily causes the subsetted assay >>>>>>> data to be duplicated. There should not be any duplication when >>>>>>> rowRanges() >>>>>>> or colData() are changed without changing their dimension / ordering. >>>>>>> >>>>>> >>>>>> Thanks Martin for checking into the regression. >>>>>> >>>>>> Sorry, I should have been more specific earlier, I meant more >>>>>> guidance/documentation in the man page for SE. I scanned the 'Extension' >>>>>> section but didn't find a note on withDimnames for extracting the matrix >>>>>> or >>>>>> this example of renaming the assays (it seems like this could easily be >>>>>> relevant for other package authors). >>>>>> >>>>>> A prominent note there might help devs write more memory efficient >>>>>> packages. >>>>>> >>>>>> The argument section mentions speed but I'd explicitly mention memory >>>>>> given that we're often storing big matrices: >>>>>> >>>>>> "Setting withDimnames=FALSE increases the speed with which assays are >>>>>> extracted." >>>>>> >>>>>> (its entirely possible the info is there but i missed it) >>>>>> >>>>>> Best, >>>>>> >>>>>> Mike >>>>>> >>>>>>> >>>>>>>> Another example of a trouble point, is that if I am given an SE with >>>>>>>> an unnamed assay and I need to give the assay a name, this also can >>>>>>>> expand the memory used. I had found a solution (which works with >>>>>>>> GenomicRanges 1.18 / current release) with: >>>>>>>> >>>>>>>> names(assays(se, withDimnames=FALSE))[1] <- "foo" >>>>>>>> >>>>>>>> But now I'm looking in devel and this appears to no longer work. The >>>>>>>> memory used expands, equivalent to: >>>>>>>> >>>>>>>> names(assays(se))[1] <- "foo" >>>>>>>> >>>>>>>> Here's some code to try this: >>>>>>>> >>>>>>>> m <- matrix(1:1e7,ncol=10,dimnames=list(1:1e6,1:10)) >>>>>>>> se <- SummarizedExperiment(m) >>>>>>>> names(assays(se, withDimnames=FALSE))[1] <- "foo" >>>>>>>> names(assays(se))[1] <- "foo" >>>>>>>> >>>>>>>> while running gc() in between steps. >>>>>>> >>>>>>> >>>>>>> >>>>>>> I think this is a regression of some sort, and I'll look into it. >>>>>>> Thanks for the heads-up. >>>>>>> >>>>>>> Martin >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Mar 9, 2015 at 10:36 AM, Kasper Daniel Hansen >>>>>>>> <kasperdanielhan...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Mar 9, 2015 at 10:30 AM, Vincent Carey >>>>>>>>> <st...@channing.harvard.edu> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I am glad you are keeping this discussion alive Kasper. >>>>>>>>>> >>>>>>>>>> On Mon, Mar 9, 2015 at 10:06 AM, Kasper Daniel Hansen < >>>>>>>>>> kasperdanielhan...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> It sounds like the proposed changes are already made. However >>>>>>>>>>> (like >>>>>>>>>>> others) I am still a bit mystified why this was necessary. The >>>>>>>>>>> old >>>>>>>>>>> version >>>>>>>>>>> did allow for a GRanges inside the DataFrame of the rowData, as >>>>>>>>>>> far as I >>>>>>>>>>> recall. So I assume this is for efficiency. But why? What kind >>>>>>>>>>> of >>>>>>>>>>> data/use cases is this for? >>>>>>>>>>> >>>>>>>>>>> I am happy to hear that SummarizedExperiment is going to be spun >>>>>>>>>>> out into >>>>>>>>>>> its own package. When that happens, I have some comments, which >>>>>>>>>>> I'll >>>>>>>>>>> include here in anticipation >>>>>>>>>>> 1) I now very strongly believe it was a design mistake to not >>>>>>>>>>> have >>>>>>>>>>> colnames on the assays. The advantage of this choice is that >>>>>>>>>>> sampleNames >>>>>>>>>>> are only stored one place. The extreme disadvantage is the high >>>>>>>>>>> ineffeciency when you want colnames on an extracted assay. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> after example(SummarizedExperiment) >>>>>>>>>> >>>>>>>>>>> colnames(assays(se1)[[1]]) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [1] "A" "B" "C" "D" "E" "F" >>>>>>>>>> >>>>>>>>>> so this seems to be optional. But attempts to set rownames will >>>>>>>>>> fail >>>>>>>>>> silently >>>>>>>>>> >>>>>>>>>>> rownames(assays(se1)[[1]]) = as.character(1:200) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> rownames(assays(se1)[[1]]) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> NULL >>>>>>>>>> seems we could issue a warning there >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Vince, you need to be careful here. >>>>>>>>> >>>>>>>>> The assays are stored without colnames (unless something has >>>>>>>>> recently >>>>>>>>> changed). The default is to - upon extraction - set the colnames >>>>>>>>> of the >>>>>>>>> matrix. This however requires a copy of the entire matrix. So >>>>>>>>> essentially, upon extraction, each assay is needlessly duplicated >>>>>>>>> to add >>>>>>>>> the colnames. This is what I mean by inefficient. I would prefer >>>>>>>>> to store >>>>>>>>> the assays with colnames. This means that changing sampleNames of >>>>>>>>> the >>>>>>>>> object will be inefficient (as it is for eSets) since it would >>>>>>>>> require a >>>>>>>>> complete copy of everything. But I would rather - much rather - >>>>>>>>> copy when >>>>>>>>> setting sampleNames than copy when extracting an assay. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Kasper >>>>>>>>> >>>>>>>>> [[alternative HTML version deleted]] >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioc-devel@r-project.org mailing list >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioc-devel@r-project.org mailing list >>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>>>>> 1100 Fairview Ave. N. >>>>>>> PO Box 19024 Seattle, WA 98109 >>>>>>> >>>>>>> Location: Arnold Building M1 B861 >>>>>>> Phone: (206) 667-2793 >> >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fredhutch.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel