Interesting detective work. This is nasty. Best, Kasper
On Thu, May 16, 2019 at 2:19 AM Pages, Herve <[email protected]> wrote: > Let's try to go to the bottom of this. But let's leave > SummarizedExperiment objects out of the picture for now and focus on what > happens with a very simple reference object. > > When you create 2 instances of a reference class with the same content: > > A <- setRefClass("A", fields=c(stuff="ANY")) > a0 <- A(stuff=letters) > a1 <- A(stuff=letters) > > > the .xData slot (which is an environment) is "different" between the 2 > instances in the sense that the 2 environments live at different addresses > in memory: > > [email protected]<mailto:[email protected]> # <environment: > 0x3812150> > [email protected]<mailto:[email protected]> # <environment: > 0x381c7e0> > identical([email protected]<mailto:[email protected]>, [email protected]<mailto:[email protected]>) # > FALSE > > > However their **content** is the same: > > all.equal([email protected]<mailto:[email protected]>, [email protected]<mailto:[email protected]>) # > TRUE > > > and the 2 objects are considered equal: > > all.equal(a0, a1) # TRUE > > > When the **content** of the 2 objects differ, all.equal() sees 2 > environments with different contents: > > b <- A(stuff=LETTERS) > isTRUE(all.equal([email protected]<mailto:[email protected]>, [email protected]<mailto:[email protected]>)) > # FALSE > > and no longer considers the 2 objects equal: > > all.equal(a0, b) # "Component “stuff”: 26 string > mismatches" > > > So far so good. > > When an object goes thru a serialization/deserialization cycle: > > saveRDS(a0, "a0.rds") > a2 <- readRDS("a0.rds") > > > the .xData slot of the restored object also lives at a different address: > > [email protected]<mailto:[email protected]> # <environment: > 0x3944668> > identical([email protected]<mailto:[email protected]>, [email protected]<mailto:[email protected]>) # > FALSE > > > (This is what serialization/deserialization does on environments so is > expected.) > > So in that aspect 'a2' is no different from 'a1'. However for 'a2' now we > have: > > all.equal(a0, a2) # "Class definitions are not identical" > > > So why is 'all.equal(a0, a2)' doing this? This cannot be explained only by > the fact that '[email protected]<mailto:[email protected]>' and '[email protected]<mailto:[email protected]>' > are non-identical environments. > > Looking at the source code for all.equal.envRefClass(), we see something > like this (slightly simplified here): > > ... > if (!identical(target$getClass(), current$getClass())) { > ... > return(sprintf("Class definitions are not identical%s", ...) > } > ... > > > So let's try this: > > identical(a0$getClass(), a1$getClass()) # TRUE > identical(a0$getClass(), a2$getClass()) # FALSE > > Note that 'x$getClass()' is not the same as 'class(x)'. The latter returns > the **class name** while the former returns the **class definition** (which > is represented by a complicated object of class refClassRepresentation). > > 'a0' and 'a2' have identical class names: > > class(a0) > # [1] "A" > # attr(,"package") > # [1] ".GlobalEnv" > > class(a2) > # [1] "A" > # attr(,"package") > # [1] ".GlobalEnv" > > identical(class(a0), class(a2)) > # [1] TRUE > > > So now the question is: even though 'a0' and 'a2' have identical **class > names**, how come they do NOT have identical **class definitions**? > > The big surprise (at least to me) is that reference objects, unlike > traditional S4 objects, CARRY THEIR OWN COPY OF THE CLASS DEFINITION! This > copy is stored in the '.refClassDef' variable stored in the .xData > environment of the object: > > ls([email protected]<mailto:[email protected]>, all=TRUE) > # [1] ".refClassDef" ".self" "getClass" "stuff" > > ls([email protected]<mailto:[email protected]>, all=TRUE) > # [1] ".refClassDef" ".self" "getClass" "stuff" > > This private copy of the class definition is actually what 'x$getClass()' > returns: > > identical(a0$getClass(), get(".refClassDef", [email protected]<mailto: > [email protected]>)) # TRUE > identical(a2$getClass(), get(".refClassDef", [email protected]<mailto: > [email protected]>)) # TRUE > > > Problem is that for 'a2' this copy of the class definition is not > identical to the **original class** definition: > > identical(getClass("A"), a0$getClass()) # TRUE > identical(getClass("A"), a2$getClass()) # FALSE > > > And this in turn is because the complicated object that represents the > class definition also contains environments (e.g. > 'getClass("A")@refMethods' is an environment) so going thru a > serialization/deserialization cycle is not a **strict no-op** on it (from > an identical() perspective). > > Replacing the copy of the class definition stored in 'a2' with the > original class definition makes the problem go away: > > rm(".refClassDef", [email protected]<mailto:[email protected]>) > assign(".refClassDef", getClass("A"), [email protected]<mailto:envir=a2@ > .xData>) > all.equal(a0, a2) # TRUE > > > Bottom line: the test 'identical(target$getClass(), current$getClass())' > performed by all.equal.envRefClass() seems too stringent. It should > probably be replaced with something a little bit more tolerant i.e. > something that considers environments that live at different addresses but > have the same content to be equal. Looks like > 'isTRUE(all.equal(target$getClass(), current$getClass()))' could do the job. > > Finally note that, in addition to the above test, all.equal.envRefClass() > also does this test (slightly simplified here): > > if (!isTRUE(all.equal(class(target), class(current)))) > return(sprintf("Classes differ: %s", ...)) > > > Maybe that's all what it needs to do to compare the classes of the 2 > objects? (Ironically this test uses all.equal() when it could use > identical().) > > Michael? > > H. > > > On 5/11/19 15:09, Aaron Lun wrote: > I would say it's much worse than mismatching class definitions. > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_SummarizedExperiment_issues_16&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=lrsb_VI7kmukb5oGfUe0HGsWu0pqT16WOnTOI4Y0JQc&s=TFNYF_XZCKo4J36DWs2BY1-6PVS18gW3iFTMRNQNDT4&e= > -A > <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_SummarizedExperiment_issues_16&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=lrsb_VI7kmukb5oGfUe0HGsWu0pqT16WOnTOI4Y0JQc&s=TFNYF_XZCKo4J36DWs2BY1-6PVS18gW3iFTMRNQNDT4&e=-A> > > On 5/11/19 5:07 AM, Martin Morgan wrote: > I think it has to do with the use of reference classes in the assay slot, > which have different environments > > se = SummarizedExperiment() > saveRDS(se, fl <- tempfile()) > se1 = readRDS(fl) > > and then > > all.equal(se@assays, se1@assays) > [1] "Class definitions are not identical" > all.equal(se@[email protected]<mailto:se@[email protected]>, se1@assays > @.xData<mailto:se1@[email protected]>) > [1] "Component \".self\": Class definitions are not identical" > se@[email protected]<mailto:se@[email protected]> > <environment: 0x7fb1de1ede90> > se1@[email protected]<mailto:se1@[email protected]> > <environment: 0x7fb1fc2bca78> > > Martin > > On 5/11/19, 6:38 AM, "Bioc-devel on behalf of Laurent Gatto" < > [email protected] on behalf of [email protected] > ><mailto:[email protected]@ > uclouvain.be> wrote: > > I would appreciate some background about the following: > > suppressPackageStartupMessages(library("SummarizedExperiment")) > > set.seed(1L) > > m <- matrix(rnorm(16), ncol = 4, dimnames = list(letters[1:4], > LETTERS[1:4])) > > rowdata <- DataFrame(X = 1:4, row.names = letters[1:4]) > > se1 <- SummarizedExperiment(m, rowData = rowdata) > > se2 <- SummarizedExperiment(m, rowData = rowdata) > > all.equal(se1, se2) > [1] TRUE > But after serialising and reading se2, the two instances aren't > equal any more: > > saveRDS(se2, file = "se2.rds") > > rm(se2) > > se2 <- readRDS("se2.rds") > > all.equal(se1, se2) > [1] "Attributes: < Component “assays”: Class definitions are not > identical >" > Session information provided below. > Thank you in advance, > Laurent > R version 3.6.0 RC (2019-04-21 r76417) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 18.04.2 LTS > Matrix products: default > BLAS: /usr/lib/x86_64-linux-gnu/libf77blas.so.3.10.3 > LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3 > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C > attached base packages: > [1] parallel stats4 stats graphics grDevices utils > datasets > [8] methods base > other attached packages: > [1] SummarizedExperiment_1.14.0 DelayedArray_0.10.0 > [3] BiocParallel_1.18.0 matrixStats_0.54.0 > [5] Biobase_2.44.0 GenomicRanges_1.36.0 > [7] GenomeInfoDb_1.20.0 IRanges_2.18.0 > [9] S4Vectors_0.22.0 BiocGenerics_0.30.0 > loaded via a namespace (and not attached): > [1] lattice_0.20-38 bitops_1.0-6 grid_3.6.0 > [4] zlibbioc_1.30.0 XVector_0.24.0 Matrix_1.2-17 > [7] tools_3.6.0 RCurl_1.95-4.12 compiler_3.6.0 > [10] GenomeInfoDbData_1.2.1 > _______________________________________________ > [email protected]<mailto:[email protected]> mailing > list > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=lrsb_VI7kmukb5oGfUe0HGsWu0pqT16WOnTOI4Y0JQc&s=5H5vUx8twlV__0HeBhCWd3Fv30MbKQshwjvr8p3zSbs&e= > _______________________________________________ > [email protected]<mailto:[email protected]> mailing list > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=lrsb_VI7kmukb5oGfUe0HGsWu0pqT16WOnTOI4Y0JQc&s=5H5vUx8twlV__0HeBhCWd3Fv30MbKQshwjvr8p3zSbs&e= > > _______________________________________________ > [email protected]<mailto:[email protected]> mailing list > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=lrsb_VI7kmukb5oGfUe0HGsWu0pqT16WOnTOI4Y0JQc&s=5H5vUx8twlV__0HeBhCWd3Fv30MbKQshwjvr8p3zSbs&e= > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: [email protected]<mailto:[email protected]> > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > > [[alternative HTML version deleted]] > > _______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
