On 07/05/2017 12:59 PM, Robert Castelo wrote:
dear developers,
in the framework of a package i maintain, VariantFiltering, i'm using
the 'FilterRules' class defined in the S4Vector package and i'm
interested in serializing (e.g., saving to disk via 'saveRDS()')
'FilterRules' objects where some rules may defined using functions.
my problem is that the resulting RDS files take much more space than
expected because apparently the environment of the functions is also
serialized.
a toy example reproducing the situation could be the following:
library(S4Vectors)
## define a function that creates a ~7Mb numerical vector
## and returns a FilterRules object on a function that has
## nothing to do with this vector, except for sharing its
## environment. this tries to reproduce the situation in which
## a 'FilterRules' object is defined within the package
## 'VariantFiltering' where the environment is full of stuff
## unrelated to the 'FilterRules' object being created.
f <- function() {
z <- rnorm(1000000)
g <- function(x) 2*x
I guess
g <- function(x) 2 * x > 10
or similar would satisfy the requirements of FilterRules to return an
equal-lengthed logical vector
fr <- FilterRules(list(g=g))
fr
}
## call the previous function to get the FilterRules object
fr <- f()
## while the 'FilterRules' object takes 3.3 Kb ...
print(object.size(fr), units="Kb")
3.3 Kb
## ... serializing it takes ~7Mb
print(object.size(serialize(fr, NULL)), units="Mb")
7.6 Mb
I added the test case
testthat::expect_equal(eval(fr, 1:10), rep(c(FALSE, TRUE), each=5))
i guess this is the expected behavior behind functions and environments,
but after reading about this subject (e.g.,
http://adv-r.had.co.nz/Environments.html) i still haven't been able to
figure out how to serialize the 'FilterRules' object without the
associated environment or with a minimal one without unnecessary objects
around.
i'm sure many of you will have an easy workaround for this. any help
will be highly appreciated.
One possibility is to set the environment of g() to something that
resolves appropriate symbols, e.g.,
f <- function() {
z <- rnorm(1000000)
g <- function(x) 2 * x > 5
environment(g) <- baseenv()
FilterRules(list(g=g))
}
the serialized size is then 11 kb and the test continues to pass. The
environment needs to be baseenv to resolve `*` and `>`; emptyenv() is
too restrictive. A package name space might often be appropriate (though
maybe large).
Maybe that's a Hack, and Michael or others will chime in with something
better...
Martin
thanks!!
robert.
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or...{{dropped:2}}
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel