On Thu, Oct 2, 2014 at 5:28 PM, Nathaniel Smith <n...@pobox.com> wrote: > On 2 Oct 2014 16:52, "Robert Kern" <robert.k...@gmail.com> wrote: >> >> On Thu, Oct 2, 2014 at 4:42 PM, Brad Buran <bbu...@alum.mit.edu> wrote: >> > Given the following: >> > >> > from numpy import random >> > rs = random.RandomState(seed=1) >> > # skip the first X billion samples >> > x = rs.uniform(0, 10) >> > >> > How do I accomplish "skip the first X billion samples" (e.g. 7.2 >> > billion)? I see that there's a numpy.random.RandomState.set_state >> > which accepts (among other parameters) a value called "pos". This >> > sounds promising, but the other parameters I'm not sure how to compute >> > (e.g. the 1D array of 624 unsigned integers, etc.). I need to be able >> > to skip ahead in the sequence to reproduce some signals that were >> > generated for experiments. I could certainly consume and discard the >> > first X billion samples; however, that seems to be computationally >> > inefficient. >> >> Unfortunately, it requires some significant number-theoretical >> precomputation for any given N number of steps that you want to skip. >> >> http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/JUMP/index.html > > If someone really wanted this functionality then I suppose it would be > possible to precompute the special jump coefficients for lengths 2, 4, 8, > 16, 32, ..., and then perform arbitrary jumps using a sequence of smaller > jumps. (The coefficient table could be shipped with the source code.)
No one needs small jumps of arbitrary size. The real use case for jumping is to make N parallel streams that won't overlap. You pick a number, let's call it `jump_steps`, much larger than any single run of your system could possibly consume (i.e. the number of core PRNG variates pulled is << `jump_steps`). Then you can initializing N parallel streams by initializing RandomState once with a seed, storing that RandomState, then jumping ahead by `jump_steps`, storing *that* RandomState, by `2*jump_steps`, etc. to get N RandomState streams that will not overlap. Give those to your separate processes and let them run. So the alternative may actually be to just generate and distribute *one* set of these jump coefficients for a really big jump size but still leaves you enough space for a really large number of streams (fortunately, 2**19937-1 is a really big number). -- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion