Ryan (et al),

FYI:

> f
function() {
x = rnorm(x)
x
}
> findGlobals(f)
[1] "="     "{"     "rnorm"

"x" should be in the list of globals but it isn't.

~G

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] codetools_0.2-8



On Sun, Nov 3, 2013 at 5:37 PM, Ryan <r...@thompsonclan.org> wrote:

>  Looking at the codetools package, I think "findGlobals" is basically
> exactly what we want here, right? As you say, there are necessarily
> limitations due to R being a dynamic language, but the goal is to catch
> common errors, not stop people from tricking the check.
>
> I think I'll try to code something up soon.
>
> -Ryan
>
>
> On 11/3/13, 5:10 PM, Gabriel Becker wrote:
>
>  Henrik,
>
> See https://github.com/duncantl/CodeDepends (as used by used by
> https://github.com/gmbecker/RCacheSuite). It will identify necessarily
> defined symbols (input variables) for code that is not doing certain tricks
> (eg get(), mixing data.frame columns and gobal variables in formulas, etc ).
>
>  Tierney's codetools package also does things along these lines but there
> are some situations where it has trouble. I can give more detail if desired.
>
>  ~G
>
>
> On Sun, Nov 3, 2013 at 3:04 PM, Ryan <r...@thompsonclan.org> wrote:
>
>> Another potential easy step we can do is that if FUN function in the
>> user's workspace, we automatically export that function under the same name
>> in the children. This would make recursive functions just work, but it
>> might be a bit too magical.
>>
>>
>> On 11/3/13, 2:38 PM, Ryan wrote:
>>
>>> Here's an easy thing we can add to BiocParallel in the short term. The
>>> following code defines a wrapper function "withBPExtraErrorText" that
>>> simply appends an additional message to the end of any error that looks
>>> like it is about a missing variable. We could wrap every evaluation in a
>>> similar tryCatch to at least provide a more informative error message when
>>> a subprocess has a missing variable.
>>>
>>> -Ryan
>>>
>>> withBPExtraErrorText <- function(expr) {
>>>    tryCatch({
>>>        expr
>>>    }, simpleError = function(err) {
>>>        if (grepl("^object '(.*)' not found$", err$message, perl=TRUE)) {
>>>            ## It is an error due to a variable not found.
>>>            err$message <- paste0(err$message, ". Maybe you forgot to
>>> export this variable from the main R session using \"bpexport\"?")
>>>        }
>>>        stop(err)
>>>    })
>>> }
>>>
>>> x <- 5
>>>
>>> ## Succeeds
>>> withBPExtraErrorText(x)
>>>
>>> ## Fails with more informative error message
>>> withBPExtraErrorText(y)
>>>
>>>
>>>
>>> On Sun Nov  3 14:01:48 2013, Henrik Bengtsson wrote:
>>>
>>>> On Sun, Nov 3, 2013 at 1:29 PM, Michael Lawrence
>>>> <lawrence.mich...@gene.com> wrote:
>>>>
>>>>> An analog to clusterExport is a good idea. To make it even easier, we
>>>>> could
>>>>> have a dynamic environment based on object tables that would catch
>>>>> missing
>>>>> symbols and download them from the parent thread. But maybe there's
>>>>> some
>>>>> benefit to being explicit?
>>>>>
>>>>
>>>> A first step to fully automate this would be to provide some (opt
>>>> in/out) mechanism for code inspection and warn about non-defined
>>>> objects (cf. 'R CMD check').  That is of course major work, but will
>>>> certainly spare the community/users 1000's of hours in troubleshooting
>>>> and the mailing lists from "why doesn't my parallel code not work"
>>>> messages.  Such protection may be better suited for the 'parallel'
>>>> package though.  Unfortunately, it's beyond my skills/time to pull
>>>> such a thing together.
>>>>
>>>> /Henrik
>>>>
>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>> On Sun, Nov 3, 2013 at 12:39 PM, Henrik Bengtsson <h...@biostat.ucsf.edu
>>>>> >
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> in BiocParallel, is there a suggested (or planned) best standards for
>>>>>> making *locally* assigned variables (e.g. functions) available to the
>>>>>> applied function when it runs in a separate R process (which will be
>>>>>> the most common use case)?  I understand that avoid local variables
>>>>>> should be avoided and it's preferred to put as mush as possible in
>>>>>> packages, but that's not always possible or very convenient.
>>>>>>
>>>>>> EXAMPLE:
>>>>>>
>>>>>> library('BiocParallel')
>>>>>> library('BatchJobs')
>>>>>>
>>>>>> # Here I pick a recursive functions to make the problem a bit harder,
>>>>>> i.e.
>>>>>> # the function needs to call itself ("itself" = see below)
>>>>>> fib <- function(n=0) {
>>>>>>    if (n < 0) stop("Invalid 'n': ", n)
>>>>>>    if (n == 0 || n == 1) return(1)
>>>>>>    fib(n-2) + fib(n-1)
>>>>>> }
>>>>>>
>>>>>> # Executing in the current R session
>>>>>> cluster.functions <- makeClusterFunctionsInteractive()
>>>>>> bpParams <- BatchJobsParam(cluster.functions=cluster.functions)
>>>>>> register(bpParams)
>>>>>> values <- bplapply(0:9, FUN=fib)
>>>>>> ## SubmitJobs |++++++++++++++++++++++++++++++++++| 100% (00:00:00)
>>>>>> ## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100% (00:00:00)
>>>>>>
>>>>>>
>>>>>> # Executing in a separate R process, where fib() is not defined
>>>>>> # (not specific to BiocParallel)
>>>>>> cluster.functions <- makeClusterFunctionsLocal()
>>>>>> bpParams <- BatchJobsParam(cluster.functions=cluster.functions)
>>>>>> register(bpParams)
>>>>>> values <- bplapply(0:9, FUN=fib)
>>>>>> ## SubmitJobs |++++++++++++++++++++++++++++++++++| 100% (00:00:00)
>>>>>> ## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100% (00:00:00)
>>>>>> Error in LastError$store(results = results, is.error = !ok,
>>>>>> throw.error =
>>>>>> TRUE)
>>>>>> :
>>>>>>    Errors occurred during execution. First error message:
>>>>>> Error in FUN(...): could not find function "fib"
>>>>>> [...]
>>>>>>
>>>>>>
>>>>>> # The following illustrates that the solution is not always
>>>>>> straightforward.
>>>>>> # (not specific to BiocParallel; must have been discussed previously)
>>>>>> values <- bplapply(0:9, FUN=function(n, fib) {
>>>>>>    fib(n)
>>>>>> }, fib=fib)
>>>>>> Error in LastError$store(results = results, is.error = !ok,
>>>>>> throw.error = TRUE) :
>>>>>>    Errors occurred during execution. First error message:
>>>>>> Error in fib(n): could not find function "fib"
>>>>>> [...]
>>>>>>
>>>>>> # Workaround; make fib() aware of itself
>>>>>> # (this is something the user need to do, and would be very
>>>>>> #  hard for BiocParallel et al. to automate.  BTW, should all
>>>>>> #  recursive functions be implemented this way?).
>>>>>> fib <- function(n=0) {
>>>>>>    if (n < 0) stop("Invalid 'n': ", n)
>>>>>>    if (n == 0 || n == 1) return(1)
>>>>>>    fib <- sys.function() # Make function aware of itself
>>>>>>    fib(n-2) + fib(n-1)
>>>>>> }
>>>>>> values <- bplapply(0:9, FUN=function(n, fib) {
>>>>>>    fib(n)
>>>>>> }, fib=fib)
>>>>>>
>>>>>>
>>>>>> WISHLIST:
>>>>>> Considering the above recursive issue solved, a slightly more explicit
>>>>>> and standardized solution is then:
>>>>>>
>>>>>> values <- bplapply(0:9, FUN=function(n, BPGLOBALS=NULL) {
>>>>>>    for (name in names(BPGLOBALS)) assign(name, BPGLOBALS[[name]])
>>>>>>    fib(n)
>>>>>> }, BPGLOBALS=list(fib=fib))
>>>>>>
>>>>>> Could the above be generalized into something as neat as:
>>>>>>
>>>>>> bpExport("fib")
>>>>>> values <- bplapply(0:9, FUN=function(n) {
>>>>>>    BiocParallel::bpImport("fib")
>>>>>>    fib(n)
>>>>>> })
>>>>>>
>>>>>> or ideally just (analogously to parallel::clusterExport()):
>>>>>>
>>>>>> bpExport("fib")
>>>>>> values <- bplapply(0:9, FUN=fib)
>>>>>>
>>>>>> /Henrik
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel@r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioc-devel@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>> _______________________________________________
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>
>
> --
> Gabriel Becker
> Graduate Student
> Statistics Department
> University of California, Davis
>
>
>


-- 
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to