I'll back-track on my advice a little, and say that the right way to enable the 
user to get reproducible results is to respect the setting the user makes 
outside your function. So for

your = function()
     unlist(bplapply(1:4, rnorm))

The user will

register(MulticoreParam(2, RNGseed=123))
your()

to always produces the identical result.

Following Aaron's strategy, the R-level approach to reproducibility might be 
along the lines of 

- tell the user to set parallel::RNGkind("L'Ecuyer-CMRG") and set.seed()
- In your function, generate seeds for each job

    n = 5; seeds <- vector("list", n)
    seeds[[1]] = .Random.seed  # FIXME fails if set.seed or random nos. have 
not been generated...
    for (i in tail(seq_len(n), -1)) seeds[[i]] = nextRNGStream(seeds[[i - 1]])

- send these, along with the job, to the workers, setting .Random.seed on each 
worker

    bpmapply(function(i, seed, ...) {
        oseed <- get(".Random.seed", envir = .GlobalEnv)
        on.exit(assign(".Random.seed", oseed, envir = .GlobalEnv))
        assign(".Random.seed", seed, envir = .GlobalEnv)
        ...
    }, seq_len(n), seeds, ...)

The use of L'Ecuyer-CMRG and `nextRNGStream()` means that the streams on each 
worker are independent. Using on.exit means that, even on the worker, the state 
of the random number generator is not changed by the evaluation. This means 
that even with SerialParam() the generator is well-behaved. I don’t know how 
BiocCheck responds to use of .Random.seed, which in general would be a bad 
thing to do but in this case with the use of on.exit() the usage seems ok.

Martin


On 12/31/18, 3:17 PM, "Lulu Chen" <luluc...@vt.edu> wrote:

    Hi Martin,
    
    
    Thanks for your help. But setting different number of workers will generate 
different results:
    
    
    > unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(1, RNGseed=123)))
     [1]  1.0654274 -1.2421454  1.0523311 -0.7744536  1.3081934 -1.5305223  
1.1525356  0.9287607 -0.4355877  1.5055436
    > unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(2, RNGseed=123)))
     [1] -0.9685927  0.7061091  1.4890213 -0.4094454  0.8909694 -0.8653704  
1.4642711  1.2674845 -0.2220491  2.4505322
    > unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(3, RNGseed=123)))
     [1] -0.96859273 -0.40944544  0.89096942 -0.86537045  1.46427111  
1.26748453 -0.48906078  0.43304237 -0.03195349
    [10]  0.14670372
    > unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(4, RNGseed=123)))
     [1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237 
-0.03195349 -1.03886641  1.57451249  0.74708204
    [10]  0.67187201
    
    
    
    Best,
    Lulu
    
    
    
    On Mon, Dec 31, 2018 at 1:12 PM Martin Morgan <mtmorgan.b...@gmail.com> 
wrote:
    
    
    The major BiocParallel objects (SnowParam(), MulticoreParam()) and use of 
bplapply() allow fully repeatable randomizations, e.g.,
    
    > library(BiocParallel)
    > unlist(bplapply(1:4, rnorm, BPPARAM=MulticoreParam(RNGseed=123)))
     [1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237 -0.03195349
     [7] -1.03886641  1.57451249  0.74708204  0.67187201
    > unlist(bplapply(1:4, rnorm, BPPARAM=MulticoreParam(RNGseed=123)))
     [1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237 -0.03195349
     [7] -1.03886641  1.57451249  0.74708204  0.67187201
    > unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(RNGseed=123)))
    [1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237 -0.03195349
     [7] -1.03886641  1.57451249  0.74708204  0.67187201
    
    The idea then would be to tell the user to register() such a param, or to 
write your function to accept an argument rngSeed along the lines of
    
    f = function(..., rngSeed = NULL) {
        if (!is.null(rngSeed)) {
            param = bpparam()  # user's preferred back-end
            oseed = bpRNGseed(param)
            on.exit(bpRNGseed(param) <- oseed)
            bpRNGseed(param) = rngSeed
        }
        bplapply(1:4, rnorm)
    }
    
    (actually, this exercise illustrates a problem with bpRNGseed<-() when the 
original seed is NULL; this will be fixed in the next day or so...)
    
    Is that sufficient for your use case?
    
    On 12/31/18, 11:24 AM, "Bioc-devel on behalf of Lulu Chen" 
<bioc-devel-boun...@r-project.org on behalf of
    luluc...@vt.edu> wrote:
    
        Dear all,
    
        I posted the question in the Bioconductor support site (
        
    https://support.bioconductor.org/p/116381/ 
<https://support.bioconductor.org/p/116381/>) and was suggested to direct
        future correspondence there.
    
        I plan to generate a vector of seeds (provided by users through 
argument of
        my R function) and use them by set.seed() in each parallel computation.
        However, set.seed() will cause warning in BiocCheck().
    
        Someone suggested to re-write code using c++, which is a good idea. But 
it
        will take me much more extra time to re-write some functions from other
        packages, e.g. eBayes() in limma.
    
        Hope to get more suggestions from you. Thanks a lot!
    
        Best,
        Lulu
    
            [[alternative HTML version deleted]]
    
        _______________________________________________
        Bioc-devel@r-project.org mailing list
        
    https://stat.ethz.ch/mailman/listinfo/bioc-devel 
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
    
    
    
    
    
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to