Ok, here is my attempt at a function to get the list of user-defined free variables that a function refers to:

https://gist.github.com/DarwinAwardWinner/7298557

Is uses codetools, so it is subject to the limitations of that package, but for simple examples, it successfully detects when a function refers to something in the global env.

On Sun Nov  3 21:14:29 2013, Gabriel Becker wrote:
Ryan (et al),

FYI:

> f
function() {
x = rnorm(x)
x
}
> findGlobals(f)
[1] "="     "{"     "rnorm"

"x" should be in the list of globals but it isn't.

~G

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] codetools_0.2-8



On Sun, Nov 3, 2013 at 5:37 PM, Ryan <r...@thompsonclan.org
<mailto:r...@thompsonclan.org>> wrote:

    Looking at the codetools package, I think "findGlobals" is
    basically exactly what we want here, right? As you say, there are
    necessarily limitations due to R being a dynamic language, but the
    goal is to catch common errors, not stop people from tricking the
    check.

    I think I'll try to code something up soon.

    -Ryan


    On 11/3/13, 5:10 PM, Gabriel Becker wrote:
    Henrik,

    See https://github.com/duncantl/CodeDepends (as used by used by
    https://github.com/gmbecker/RCacheSuite). It will identify
    necessarily defined symbols (input variables) for code that is
    not doing certain tricks (eg get(), mixing data.frame columns and
    gobal variables in formulas, etc ).

    Tierney's codetools package also does things along these lines
    but there are some situations where it has trouble. I can give
    more detail if desired.

    ~G


    On Sun, Nov 3, 2013 at 3:04 PM, Ryan <r...@thompsonclan.org
    <mailto:r...@thompsonclan.org>> wrote:

        Another potential easy step we can do is that if FUN function
        in the user's workspace, we automatically export that
        function under the same name in the children. This would make
        recursive functions just work, but it might be a bit too
        magical.


        On 11/3/13, 2:38 PM, Ryan wrote:

            Here's an easy thing we can add to BiocParallel in the
            short term. The following code defines a wrapper function
            "withBPExtraErrorText" that simply appends an additional
            message to the end of any error that looks like it is
            about a missing variable. We could wrap every evaluation
            in a similar tryCatch to at least provide a more
            informative error message when a subprocess has a missing
            variable.

            -Ryan

            withBPExtraErrorText <- function(expr) {
               tryCatch({
                   expr
               }, simpleError = function(err) {
                   if (grepl("^object '(.*)' not found$",
            err$message, perl=TRUE)) {
                       ## It is an error due to a variable not found.
                       err$message <- paste0(err$message, ". Maybe
            you forgot to export this variable from the main R
            session using \"bpexport\"?")
                   }
                   stop(err)
               })
            }

            x <- 5

            ## Succeeds
            withBPExtraErrorText(x)

            ## Fails with more informative error message
            withBPExtraErrorText(y)



            On Sun Nov  3 14:01:48 2013, Henrik Bengtsson wrote:

                On Sun, Nov 3, 2013 at 1:29 PM, Michael Lawrence
                <lawrence.mich...@gene.com
                <mailto:lawrence.mich...@gene.com>> wrote:

                    An analog to clusterExport is a good idea. To
                    make it even easier, we could
                    have a dynamic environment based on object tables
                    that would catch missing
                    symbols and download them from the parent thread.
                    But maybe there's some
                    benefit to being explicit?


                A first step to fully automate this would be to
                provide some (opt
                in/out) mechanism for code inspection and warn about
                non-defined
                objects (cf. 'R CMD check').  That is of course major
                work, but will
                certainly spare the community/users 1000's of hours
                in troubleshooting
                and the mailing lists from "why doesn't my parallel
                code not work"
                messages.  Such protection may be better suited for
                the 'parallel'
                package though.  Unfortunately, it's beyond my
                skills/time to pull
                such a thing together.

                /Henrik


                    Michael


                    On Sun, Nov 3, 2013 at 12:39 PM, Henrik Bengtsson
                    <h...@biostat.ucsf.edu <mailto:h...@biostat.ucsf.edu>>
                    wrote:


                        Hi,

                        in BiocParallel, is there a suggested (or
                        planned) best standards for
                        making *locally* assigned variables (e.g.
                        functions) available to the
                        applied function when it runs in a separate R
                        process (which will be
                        the most common use case)?  I understand that
                        avoid local variables
                        should be avoided and it's preferred to put
                        as mush as possible in
                        packages, but that's not always possible or
                        very convenient.

                        EXAMPLE:

                        library('BiocParallel')
                        library('BatchJobs')

                        # Here I pick a recursive functions to make
                        the problem a bit harder, i.e.
                        # the function needs to call itself ("itself"
                        = see below)
                        fib <- function(n=0) {
                           if (n < 0) stop("Invalid 'n': ", n)
                           if (n == 0 || n == 1) return(1)
                           fib(n-2) + fib(n-1)
                        }

                        # Executing in the current R session
                        cluster.functions <-
                        makeClusterFunctionsInteractive()
                        bpParams <-
                        BatchJobsParam(cluster.functions=cluster.functions)
                        register(bpParams)
                        values <- bplapply(0:9, FUN=fib)
                        ## SubmitJobs
                        |++++++++++++++++++++++++++++++++++| 100%
                        (00:00:00)
                        ## Waiting [S:0 R:0 D:10 E:0]
                        |+++++++++++++++++++| 100% (00:00:00)


                        # Executing in a separate R process, where
                        fib() is not defined
                        # (not specific to BiocParallel)
                        cluster.functions <- makeClusterFunctionsLocal()
                        bpParams <-
                        BatchJobsParam(cluster.functions=cluster.functions)
                        register(bpParams)
                        values <- bplapply(0:9, FUN=fib)
                        ## SubmitJobs
                        |++++++++++++++++++++++++++++++++++| 100%
                        (00:00:00)
                        ## Waiting [S:0 R:0 D:10 E:0]
                        |+++++++++++++++++++| 100% (00:00:00)
                        Error in LastError$store(results = results,
                        is.error = !ok, throw.error =
                        TRUE)
                        :
                           Errors occurred during execution. First
                        error message:
                        Error in FUN(...): could not find function "fib"
                        [...]


                        # The following illustrates that the solution
                        is not always
                        straightforward.
                        # (not specific to BiocParallel; must have
                        been discussed previously)
                        values <- bplapply(0:9, FUN=function(n, fib) {
                           fib(n)
                        }, fib=fib)
                        Error in LastError$store(results = results,
                        is.error = !ok,
                        throw.error = TRUE) :
                           Errors occurred during execution. First
                        error message:
                        Error in fib(n): could not find function "fib"
                        [...]

                        # Workaround; make fib() aware of itself
                        # (this is something the user need to do, and
                        would be very
                        #  hard for BiocParallel et al. to automate.
                         BTW, should all
                        #  recursive functions be implemented this way?).
                        fib <- function(n=0) {
                           if (n < 0) stop("Invalid 'n': ", n)
                           if (n == 0 || n == 1) return(1)
                           fib <- sys.function() # Make function
                        aware of itself
                           fib(n-2) + fib(n-1)
                        }
                        values <- bplapply(0:9, FUN=function(n, fib) {
                           fib(n)
                        }, fib=fib)


                        WISHLIST:
                        Considering the above recursive issue solved,
                        a slightly more explicit
                        and standardized solution is then:

                        values <- bplapply(0:9, FUN=function(n,
                        BPGLOBALS=NULL) {
                           for (name in names(BPGLOBALS))
                        assign(name, BPGLOBALS[[name]])
                           fib(n)
                        }, BPGLOBALS=list(fib=fib))

                        Could the above be generalized into something
                        as neat as:

                        bpExport("fib")
                        values <- bplapply(0:9, FUN=function(n) {
                           BiocParallel::bpImport("fib")
                           fib(n)
                        })

                        or ideally just (analogously to
                        parallel::clusterExport()):

                        bpExport("fib")
                        values <- bplapply(0:9, FUN=fib)

                        /Henrik

                        _______________________________________________
                        Bioc-devel@r-project.org
                        <mailto:Bioc-devel@r-project.org> mailing list
                        https://stat.ethz.ch/mailman/listinfo/bioc-devel




                _______________________________________________
                Bioc-devel@r-project.org
                <mailto:Bioc-devel@r-project.org> mailing list
                https://stat.ethz.ch/mailman/listinfo/bioc-devel


        _______________________________________________
        Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
        mailing list
        https://stat.ethz.ch/mailman/listinfo/bioc-devel




    --
    Gabriel Becker
    Graduate Student
    Statistics Department
    University of California, Davis




--
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to