On Mon, May 23, 2011 at 3:27 PM, Bruce Southey <[email protected]> wrote: > On 05/23/2011 02:02 PM, Robert Kern wrote: >> On Mon, May 23, 2011 at 13:33,<[email protected]> wrote: >>> I have a function in two versions, one vectorized, one with loop >>> >>> the vectorized function gets all randn variables in one big array >>> rvs = distr.rvs(args, **{'size':(nobs, nrep)}) >>> >>> the looping version has: >>> for irep in xrange(nrep): >>> rvs = distr.rvs(args, **{'size':nobs}) >>> >>> the rest should be identical (except for vectorization > What happened to your 'irep' and 'nrep' variables in the vectorized and > looping versions, respectively? > The looping version is overwriting 'rvs' but the vectorized version is > not unless you are accumulating it elsewhere. (If so then there is > another difference..).
Yes in the loop case, I'm accumulating only what I need and throw away the random numbers immediately. In the vectorized version, I build and use the entire (nrep, nobs) = (10000, 200) or (10000, 1000) array at once, which is faster than the loop. > Also, I try to avoid having variables the same name as a function just > in case (old habits die hard). rvs is a method, I never have a function called rvs, and it's one of my favorite names for an array of random variables during testing. > >>> Is there a guarantee that the 2d arrays are filled up in a specific >>> order so that the loop and vectorized version produce the same result, >>> given the same seed? >> No general guarantee for all of the scipy distributions, no. I suspect >> that all of the RandomState methods do work this way, though. >> > You have to guarantee that the complete stream of random numbers was not > interrupted such as an addition calls or reset between loops. That > probably means to generate and store all of the numbers at once and then > just access that array as needed. > > But then again, if you are doing bootstrapping, it really should not > matter if you do 'sufficient' resamples. I won't rely on it later on, and I don't think it makes a difference, but getting identical results across different implementations is very useful for testing. Josef > > > Bruce > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
