Hello,

My package `clusterExperiment` has not changed but is hitting errors on the 
devel branch. I’ve pinpointed it to the fact that a small dataset I am running 
the tests on is randomly subsetted from a larger subset and is no longer 
choosing the same observations. I have already in previous version corrected 
the tests for the change in random number generation in R.4.0.x. I am wondering 
if it is related to the changes in BiocParallel 
(https://community-bioc.slack.com/archives/CEQ04GKEC/p1631903391030800?thread_ts=1631881095.027600&cid=CEQ04GKEC
 
<https://community-bioc.slack.com/archives/CEQ04GKEC/p1631903391030800?thread_ts=1631881095.027600&cid=CEQ04GKEC>).

It was unexpected for me that this would affect these results. My package 
doesn’t use BiocParallel or depend on it. But it turns out the code in question 
does make a call to BiocSingular to run a PCA, and BiocSingular does make calls 
to BiocParallel. What is strange to me is that even if I don’t directly use the 
results of runPCA, but simply make the call to runPCA before running the code 
in question, the output of that code is changed. So this seems to me to 
indicate that the sequence of random numbers is being globally affected by the 
change, and not just internally to the results of calls to BiocParallel. I 
didn’t realize this was the case from the above discussion — I thought it would 
only affect output that directly relied on calls to BiocParallel — and I was 
hoping someone could confirm that this is what is happening and/or give me 
explicit way to check this is the source of my errors. 

Here’s the basic setup. I have a setup file that sets up a lot of objects for 
my tests (setup_create_objects.R). The relevant parts look something like this 
(I’ve simplified it from what’s actually in the file so it more clearly shows 
the progression):

data(simData)
suppressWarnings(RNGversion("3.5.0"))
set.seed(23)

… # bunch of code

clusterIds<- … # code that internally calls BiocSingular::runPCA

… # bunch of code

### sample 3 observations from each cluster:
whSamp<-unlist(tapply(1:ncol(simData),clusterIds,function(x){sample(x=x,size=3)}))
smSimData<-simData[1:20,whSamp]

This results in different values of clusterIds and thus different whSamp on the 
release and the devel version.

The unexpected part was even if I add a line that manually overwrites 
clusterIds to be the values of the vector `clusterIds` from the release version 
(copied manually from running on a different computer that is not the devel 
version) I don’t get the same result of whSamp (I still run the code for 
`clusterIds`, so BiocSingular::runPCA is still being called). If, however, when 
I manually feed the correct clusterIds on the devel version, I ALSO put in a 
new call to `set.seed` in the line before calling whSamp then both the devel 
and the release version give the same result, as I would expect. This makes me 
think that that the random seed has been affected globally. Further, the second 
entry of .Random.seed is not the same after running setup_create_objects.R on 
the devel version as the new version. 

Thanks,
Elizabeth Purdom



        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to