Let's try to go to the bottom of this. But let's leave SummarizedExperiment
objects out of the picture for now and focus on what happens with a very simple
reference object.
When you create 2 instances of a reference class with the same content:
A <- setRefClass("A", fields=c(stuff="ANY"))
a0 <- A(stuff=letters)
a1 <- A(stuff=letters)
the .xData slot (which is an environment) is "different" between the 2
instances in the sense that the 2 environments live at different addresses in
memory:
[email protected]<mailto:[email protected]> # <environment: 0x3812150>
[email protected]<mailto:[email protected]> # <environment: 0x381c7e0>
identical([email protected]<mailto:[email protected]>, [email protected]<mailto:[email protected]>) # FALSE
However their **content** is the same:
all.equal([email protected]<mailto:[email protected]>, [email protected]<mailto:[email protected]>) # TRUE
and the 2 objects are considered equal:
all.equal(a0, a1) # TRUE
When the **content** of the 2 objects differ, all.equal() sees 2 environments
with different contents:
b <- A(stuff=LETTERS)
isTRUE(all.equal([email protected]<mailto:[email protected]>, [email protected]<mailto:[email protected]>)) #
FALSE
and no longer considers the 2 objects equal:
all.equal(a0, b) # "Component “stuff”: 26 string mismatches"
So far so good.
When an object goes thru a serialization/deserialization cycle:
saveRDS(a0, "a0.rds")
a2 <- readRDS("a0.rds")
the .xData slot of the restored object also lives at a different address:
[email protected]<mailto:[email protected]> # <environment: 0x3944668>
identical([email protected]<mailto:[email protected]>, [email protected]<mailto:[email protected]>) # FALSE
(This is what serialization/deserialization does on environments so is
expected.)
So in that aspect 'a2' is no different from 'a1'. However for 'a2' now we have:
all.equal(a0, a2) # "Class definitions are not identical"
So why is 'all.equal(a0, a2)' doing this? This cannot be explained only by the
fact that '[email protected]<mailto:[email protected]>' and '[email protected]<mailto:[email protected]>' are
non-identical environments.
Looking at the source code for all.equal.envRefClass(), we see something like
this (slightly simplified here):
...
if (!identical(target$getClass(), current$getClass())) {
...
return(sprintf("Class definitions are not identical%s", ...)
}
...
So let's try this:
identical(a0$getClass(), a1$getClass()) # TRUE
identical(a0$getClass(), a2$getClass()) # FALSE
Note that 'x$getClass()' is not the same as 'class(x)'. The latter returns the
**class name** while the former returns the **class definition** (which is
represented by a complicated object of class refClassRepresentation).
'a0' and 'a2' have identical class names:
class(a0)
# [1] "A"
# attr(,"package")
# [1] ".GlobalEnv"
class(a2)
# [1] "A"
# attr(,"package")
# [1] ".GlobalEnv"
identical(class(a0), class(a2))
# [1] TRUE
So now the question is: even though 'a0' and 'a2' have identical **class
names**, how come they do NOT have identical **class definitions**?
The big surprise (at least to me) is that reference objects, unlike traditional
S4 objects, CARRY THEIR OWN COPY OF THE CLASS DEFINITION! This copy is stored
in the '.refClassDef' variable stored in the .xData environment of the object:
ls([email protected]<mailto:[email protected]>, all=TRUE)
# [1] ".refClassDef" ".self" "getClass" "stuff"
ls([email protected]<mailto:[email protected]>, all=TRUE)
# [1] ".refClassDef" ".self" "getClass" "stuff"
This private copy of the class definition is actually what 'x$getClass()'
returns:
identical(a0$getClass(), get(".refClassDef",
[email protected]<mailto:[email protected]>)) # TRUE
identical(a2$getClass(), get(".refClassDef",
[email protected]<mailto:[email protected]>)) # TRUE
Problem is that for 'a2' this copy of the class definition is not identical to
the **original class** definition:
identical(getClass("A"), a0$getClass()) # TRUE
identical(getClass("A"), a2$getClass()) # FALSE
And this in turn is because the complicated object that represents the class
definition also contains environments (e.g. 'getClass("A")@refMethods' is an
environment) so going thru a serialization/deserialization cycle is not a
**strict no-op** on it (from an identical() perspective).
Replacing the copy of the class definition stored in 'a2' with the original
class definition makes the problem go away:
rm(".refClassDef", [email protected]<mailto:[email protected]>)
assign(".refClassDef", getClass("A"), [email protected]<mailto:[email protected]>)
all.equal(a0, a2) # TRUE
Bottom line: the test 'identical(target$getClass(), current$getClass())'
performed by all.equal.envRefClass() seems too stringent. It should probably be
replaced with something a little bit more tolerant i.e. something that
considers environments that live at different addresses but have the same
content to be equal. Looks like 'isTRUE(all.equal(target$getClass(),
current$getClass()))' could do the job.
Finally note that, in addition to the above test, all.equal.envRefClass() also
does this test (slightly simplified here):
if (!isTRUE(all.equal(class(target), class(current))))
return(sprintf("Classes differ: %s", ...))
Maybe that's all what it needs to do to compare the classes of the 2 objects?
(Ironically this test uses all.equal() when it could use identical().)
Michael?
H.
On 5/11/19 15:09, Aaron Lun wrote:
I would say it's much worse than mismatching class definitions.
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_SummarizedExperiment_issues_16&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=lrsb_VI7kmukb5oGfUe0HGsWu0pqT16WOnTOI4Y0JQc&s=TFNYF_XZCKo4J36DWs2BY1-6PVS18gW3iFTMRNQNDT4&e=
-A
On 5/11/19 5:07 AM, Martin Morgan wrote:
I think it has to do with the use of reference classes in the assay slot, which
have different environments
se = SummarizedExperiment()
saveRDS(se, fl <- tempfile())
se1 = readRDS(fl)
and then
all.equal(se@assays, se1@assays)
[1] "Class definitions are not identical"
all.equal(se@[email protected]<mailto:se@[email protected]>,
se1@[email protected]<mailto:se1@[email protected]>)
[1] "Component \".self\": Class definitions are not identical"
se@[email protected]<mailto:se@[email protected]>
<environment: 0x7fb1de1ede90>
se1@[email protected]<mailto:se1@[email protected]>
<environment: 0x7fb1fc2bca78>
Martin
On 5/11/19, 6:38 AM, "Bioc-devel on behalf of Laurent Gatto"
<[email protected] on behalf of
[email protected]><mailto:[email protected][email protected]>
wrote:
I would appreciate some background about the following:
> suppressPackageStartupMessages(library("SummarizedExperiment"))
> set.seed(1L)
> m <- matrix(rnorm(16), ncol = 4, dimnames = list(letters[1:4],
LETTERS[1:4]))
> rowdata <- DataFrame(X = 1:4, row.names = letters[1:4])
> se1 <- SummarizedExperiment(m, rowData = rowdata)
> se2 <- SummarizedExperiment(m, rowData = rowdata)
> all.equal(se1, se2)
[1] TRUE
But after serialising and reading se2, the two instances aren't equal
any more:
> saveRDS(se2, file = "se2.rds")
> rm(se2)
> se2 <- readRDS("se2.rds")
> all.equal(se1, se2)
[1] "Attributes: < Component “assays”: Class definitions are not identical
>"
Session information provided below.
Thank you in advance,
Laurent
R version 3.6.0 RC (2019-04-21 r76417)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/libf77blas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] SummarizedExperiment_1.14.0 DelayedArray_0.10.0
[3] BiocParallel_1.18.0 matrixStats_0.54.0
[5] Biobase_2.44.0 GenomicRanges_1.36.0
[7] GenomeInfoDb_1.20.0 IRanges_2.18.0
[9] S4Vectors_0.22.0 BiocGenerics_0.30.0
loaded via a namespace (and not attached):
[1] lattice_0.20-38 bitops_1.0-6 grid_3.6.0
[4] zlibbioc_1.30.0 XVector_0.24.0 Matrix_1.2-17
[7] tools_3.6.0 RCurl_1.95-4.12 compiler_3.6.0
[10] GenomeInfoDbData_1.2.1
_______________________________________________
[email protected]<mailto:[email protected]> mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=lrsb_VI7kmukb5oGfUe0HGsWu0pqT16WOnTOI4Y0JQc&s=5H5vUx8twlV__0HeBhCWd3Fv30MbKQshwjvr8p3zSbs&e=
_______________________________________________
[email protected]<mailto:[email protected]> mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=lrsb_VI7kmukb5oGfUe0HGsWu0pqT16WOnTOI4Y0JQc&s=5H5vUx8twlV__0HeBhCWd3Fv30MbKQshwjvr8p3zSbs&e=
_______________________________________________
[email protected]<mailto:[email protected]> mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=lrsb_VI7kmukb5oGfUe0HGsWu0pqT16WOnTOI4Y0JQc&s=5H5vUx8twlV__0HeBhCWd3Fv30MbKQshwjvr8p3zSbs&e=
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: [email protected]<mailto:[email protected]>
Phone: (206) 667-5791
Fax: (206) 667-1319
[[alternative HTML version deleted]]
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel