Ryan (et al), FYI:
> f function() { x = rnorm(x) x } > findGlobals(f) [1] "=" "{" "rnorm" "x" should be in the list of globals but it isn't. ~G > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] codetools_0.2-8 On Sun, Nov 3, 2013 at 5:37 PM, Ryan <r...@thompsonclan.org> wrote: > Looking at the codetools package, I think "findGlobals" is basically > exactly what we want here, right? As you say, there are necessarily > limitations due to R being a dynamic language, but the goal is to catch > common errors, not stop people from tricking the check. > > I think I'll try to code something up soon. > > -Ryan > > > On 11/3/13, 5:10 PM, Gabriel Becker wrote: > > Henrik, > > See https://github.com/duncantl/CodeDepends (as used by used by > https://github.com/gmbecker/RCacheSuite). It will identify necessarily > defined symbols (input variables) for code that is not doing certain tricks > (eg get(), mixing data.frame columns and gobal variables in formulas, etc ). > > Tierney's codetools package also does things along these lines but there > are some situations where it has trouble. I can give more detail if desired. > > ~G > > > On Sun, Nov 3, 2013 at 3:04 PM, Ryan <r...@thompsonclan.org> wrote: > >> Another potential easy step we can do is that if FUN function in the >> user's workspace, we automatically export that function under the same name >> in the children. This would make recursive functions just work, but it >> might be a bit too magical. >> >> >> On 11/3/13, 2:38 PM, Ryan wrote: >> >>> Here's an easy thing we can add to BiocParallel in the short term. The >>> following code defines a wrapper function "withBPExtraErrorText" that >>> simply appends an additional message to the end of any error that looks >>> like it is about a missing variable. We could wrap every evaluation in a >>> similar tryCatch to at least provide a more informative error message when >>> a subprocess has a missing variable. >>> >>> -Ryan >>> >>> withBPExtraErrorText <- function(expr) { >>> tryCatch({ >>> expr >>> }, simpleError = function(err) { >>> if (grepl("^object '(.*)' not found$", err$message, perl=TRUE)) { >>> ## It is an error due to a variable not found. >>> err$message <- paste0(err$message, ". Maybe you forgot to >>> export this variable from the main R session using \"bpexport\"?") >>> } >>> stop(err) >>> }) >>> } >>> >>> x <- 5 >>> >>> ## Succeeds >>> withBPExtraErrorText(x) >>> >>> ## Fails with more informative error message >>> withBPExtraErrorText(y) >>> >>> >>> >>> On Sun Nov 3 14:01:48 2013, Henrik Bengtsson wrote: >>> >>>> On Sun, Nov 3, 2013 at 1:29 PM, Michael Lawrence >>>> <lawrence.mich...@gene.com> wrote: >>>> >>>>> An analog to clusterExport is a good idea. To make it even easier, we >>>>> could >>>>> have a dynamic environment based on object tables that would catch >>>>> missing >>>>> symbols and download them from the parent thread. But maybe there's >>>>> some >>>>> benefit to being explicit? >>>>> >>>> >>>> A first step to fully automate this would be to provide some (opt >>>> in/out) mechanism for code inspection and warn about non-defined >>>> objects (cf. 'R CMD check'). That is of course major work, but will >>>> certainly spare the community/users 1000's of hours in troubleshooting >>>> and the mailing lists from "why doesn't my parallel code not work" >>>> messages. Such protection may be better suited for the 'parallel' >>>> package though. Unfortunately, it's beyond my skills/time to pull >>>> such a thing together. >>>> >>>> /Henrik >>>> >>>> >>>>> Michael >>>>> >>>>> >>>>> On Sun, Nov 3, 2013 at 12:39 PM, Henrik Bengtsson <h...@biostat.ucsf.edu >>>>> > >>>>> wrote: >>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> in BiocParallel, is there a suggested (or planned) best standards for >>>>>> making *locally* assigned variables (e.g. functions) available to the >>>>>> applied function when it runs in a separate R process (which will be >>>>>> the most common use case)? I understand that avoid local variables >>>>>> should be avoided and it's preferred to put as mush as possible in >>>>>> packages, but that's not always possible or very convenient. >>>>>> >>>>>> EXAMPLE: >>>>>> >>>>>> library('BiocParallel') >>>>>> library('BatchJobs') >>>>>> >>>>>> # Here I pick a recursive functions to make the problem a bit harder, >>>>>> i.e. >>>>>> # the function needs to call itself ("itself" = see below) >>>>>> fib <- function(n=0) { >>>>>> if (n < 0) stop("Invalid 'n': ", n) >>>>>> if (n == 0 || n == 1) return(1) >>>>>> fib(n-2) + fib(n-1) >>>>>> } >>>>>> >>>>>> # Executing in the current R session >>>>>> cluster.functions <- makeClusterFunctionsInteractive() >>>>>> bpParams <- BatchJobsParam(cluster.functions=cluster.functions) >>>>>> register(bpParams) >>>>>> values <- bplapply(0:9, FUN=fib) >>>>>> ## SubmitJobs |++++++++++++++++++++++++++++++++++| 100% (00:00:00) >>>>>> ## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100% (00:00:00) >>>>>> >>>>>> >>>>>> # Executing in a separate R process, where fib() is not defined >>>>>> # (not specific to BiocParallel) >>>>>> cluster.functions <- makeClusterFunctionsLocal() >>>>>> bpParams <- BatchJobsParam(cluster.functions=cluster.functions) >>>>>> register(bpParams) >>>>>> values <- bplapply(0:9, FUN=fib) >>>>>> ## SubmitJobs |++++++++++++++++++++++++++++++++++| 100% (00:00:00) >>>>>> ## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100% (00:00:00) >>>>>> Error in LastError$store(results = results, is.error = !ok, >>>>>> throw.error = >>>>>> TRUE) >>>>>> : >>>>>> Errors occurred during execution. First error message: >>>>>> Error in FUN(...): could not find function "fib" >>>>>> [...] >>>>>> >>>>>> >>>>>> # The following illustrates that the solution is not always >>>>>> straightforward. >>>>>> # (not specific to BiocParallel; must have been discussed previously) >>>>>> values <- bplapply(0:9, FUN=function(n, fib) { >>>>>> fib(n) >>>>>> }, fib=fib) >>>>>> Error in LastError$store(results = results, is.error = !ok, >>>>>> throw.error = TRUE) : >>>>>> Errors occurred during execution. First error message: >>>>>> Error in fib(n): could not find function "fib" >>>>>> [...] >>>>>> >>>>>> # Workaround; make fib() aware of itself >>>>>> # (this is something the user need to do, and would be very >>>>>> # hard for BiocParallel et al. to automate. BTW, should all >>>>>> # recursive functions be implemented this way?). >>>>>> fib <- function(n=0) { >>>>>> if (n < 0) stop("Invalid 'n': ", n) >>>>>> if (n == 0 || n == 1) return(1) >>>>>> fib <- sys.function() # Make function aware of itself >>>>>> fib(n-2) + fib(n-1) >>>>>> } >>>>>> values <- bplapply(0:9, FUN=function(n, fib) { >>>>>> fib(n) >>>>>> }, fib=fib) >>>>>> >>>>>> >>>>>> WISHLIST: >>>>>> Considering the above recursive issue solved, a slightly more explicit >>>>>> and standardized solution is then: >>>>>> >>>>>> values <- bplapply(0:9, FUN=function(n, BPGLOBALS=NULL) { >>>>>> for (name in names(BPGLOBALS)) assign(name, BPGLOBALS[[name]]) >>>>>> fib(n) >>>>>> }, BPGLOBALS=list(fib=fib)) >>>>>> >>>>>> Could the above be generalized into something as neat as: >>>>>> >>>>>> bpExport("fib") >>>>>> values <- bplapply(0:9, FUN=function(n) { >>>>>> BiocParallel::bpImport("fib") >>>>>> fib(n) >>>>>> }) >>>>>> >>>>>> or ideally just (analogously to parallel::clusterExport()): >>>>>> >>>>>> bpExport("fib") >>>>>> values <- bplapply(0:9, FUN=fib) >>>>>> >>>>>> /Henrik >>>>>> >>>>>> _______________________________________________ >>>>>> Bioc-devel@r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>> >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Bioc-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>> >>> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > > > > -- > Gabriel Becker > Graduate Student > Statistics Department > University of California, Davis > > > -- Gabriel Becker Graduate Student Statistics Department University of California, Davis [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel