On Mon, Dec 20, 2010 at 10:28, Alan G Isaac <alan.is...@gmail.com> wrote: > I want to sample *without* replacement from a vector > (as with Python's random.sample). I don't see a direct > replacement for this, and I don't want to carry two > PRNG's around. Is the best way something like this? > > permutation(myvector)[:samplesize]
For one of my personal projects, I copied over the mtrand package and added a method to RandomState for doing this kind of thing using reservoir sampling. http://en.wikipedia.org/wiki/Reservoir_sampling def subset_reservoir(self, long nselected, long ntotal, object size=None): """ Sample a given number integers from the set [0, ntotal) without replacement using a reservoir algorithm. Parameters ---------- nselected : int The number of integers to sample. ntotal : int The size of the set to sample from. size : int, sequence of ints, or None The number of subsets to sample or a shape tuple. An axis of the length nselected will be appended to a shape. Returns ------- out : ndarray The sampled subsets. The order of the items is not necessarily random. Use a slice from the result of permutation() if you need the order of the items to be randomized. """ cdef long total_size, length, i, j, u cdef cnp.ndarray[cnp.int_t, ndim=2] out if size is None: shape = (nselected,) total_size = nselected length = 1 elif isinstance(size, int): shape = (size, nselected) total_size = size * nselected length = size else: shape = size + (nselected,) length = 1 for i from 0 <= i < len(size): length *= size[i] total_size = length * nselected out = np.empty((length, nselected), dtype=int) for i from 0 <= i < length: for j from 0 <= j < nselected: out[i,j] = j for j from nselected <= j < ntotal: u = <long>rk_interval(j+1, self.internal_state) if u < nselected: out[i,u] = j return out.reshape(shape) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion