On 07/05/2017 05:12 PM, Robert Castelo wrote:
On 05/07/2017 20:39, Martin Morgan wrote:
On 07/05/2017 12:59 PM, Robert Castelo wrote:
dear developers,
in the framework of a package i maintain, VariantFiltering, i'm using
the 'FilterRules' class defined in the S4Vector package and i'm
interested in serializing (e.g., saving to disk via 'saveRDS()')
'FilterRules' objects where some rules may defined using functions.
my problem is that the resulting RDS files take much more space than
expected because apparently the environment of the functions is also
serialized.
a toy example reproducing the situation could be the following:
library(S4Vectors)
## define a function that creates a ~7Mb numerical vector
## and returns a FilterRules object on a function that has
## nothing to do with this vector, except for sharing its
## environment. this tries to reproduce the situation in which
## a 'FilterRules' object is defined within the package
## 'VariantFiltering' where the environment is full of stuff
## unrelated to the 'FilterRules' object being created.
f <- function() {
z <- rnorm(1000000)
g <- function(x) 2*x
I guess
g <- function(x) 2 * x > 10
or similar would satisfy the requirements of FilterRules to return an
equal-lengthed logical vector
oops, yes of course.
fr <- FilterRules(list(g=g))
fr
}
## call the previous function to get the FilterRules object
fr <- f()
## while the 'FilterRules' object takes 3.3 Kb ...
print(object.size(fr), units="Kb")
3.3 Kb
## ... serializing it takes ~7Mb
print(object.size(serialize(fr, NULL)), units="Mb")
7.6 Mb
I added the test case
testthat::expect_equal(eval(fr, 1:10), rep(c(FALSE, TRUE), each=5))
but then
g <- function(x) x > 10
which is good for simplicity
i guess this is the expected behavior behind functions and
environments, but after reading about this subject (e.g.,
http://adv-r.had.co.nz/Environments.html) i still haven't been able
to figure out how to serialize the 'FilterRules' object without the
associated environment or with a minimal one without unnecessary
objects around.
i'm sure many of you will have an easy workaround for this. any help
will be highly appreciated.
One possibility is to set the environment of g() to something that
resolves appropriate symbols, e.g.,
f <- function() {
z <- rnorm(1000000)
g <- function(x) 2 * x > 5
environment(g) <- baseenv()
FilterRules(list(g=g))
}
the serialized size is then 11 kb and the test continues to pass. The
environment needs to be baseenv to resolve `*` and `>`; emptyenv() is
too restrictive. A package name space might often be appropriate
(though maybe large).
Maybe that's a Hack, and Michael or others will chime in with
something better...
thanks!! indeed this reduces the size down to 1 kb:
f <- function() {
z <- rnorm(1000000)
g <- function(x) x > 5
environment(g) <- baseenv()
fr <- FilterRules(list(g=g))
fr
}
fr <- f()
testthat::expect_equal(eval(fr, 1:10), rep(c(FALSE, TRUE), each=5))
print(object.size(fr), units="Kb")
1Kb
print(object.size(serialize(fr, NULL)), units="Kb")
1Kb
how would set the environment of the function to a package namespace?
wouldn't make more sense to leave it with baseenv() and call
'require(pkg)' within the function to load whatever the function needs
from package 'pkg'?
environment(g) = getNamespace("S4Vectors")
but yes, maybe via setting to baseenv() and fully resolving symbols
foo::bar() rather than require / etc.
Martin
robert.
Martin
thanks!!
robert.
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential
information. If you are not the intended recipient(s), or the
employee or agent responsible for the delivery of this message to the
intended recipient(s), you are hereby notified that any disclosure,
copying, distribution, or use of this email message is prohibited. If
you have received this message in error, please notify the sender
immediately by e-mail and delete this email message from your
computer. Thank you.
This email message may contain legally privileged and/or...{{dropped:2}}
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel