I know this question came up on the mailing list some time ago (19/09/2008), and the conclusion was that yes, you can do it more or less efficiently in pure python; the trick is to use two different methods. If your sample is more than, say, a quarter the size of the set you're drawing from, you permute the set and take the first few. If your sample is smaller, you draw with replacement, then redraw the duplicates, and repeat until there aren't any more duplicates. Since you only do this when your sample is much smaller than the population you don't need to repeat many times.
Here's the code I posted to the previous discussion (not tested this time around) with comments: ''' def choose_without_replacement(m,n,repeats=None): """Choose n nonnegative integers less than m without replacement Returns an array of shape n, or (n,repeats). """ if repeats is None: r = 1 else: r = repeats if n>m: raise ValueError, "Cannot find %d nonnegative integers less than %d" % (n,m) if n>m/2: res = np.sort(np.random.rand(m,r).argsort(axis=0)[:n,:],axis=0) else: res = np.random.random_integers(m,size=(n,r)) while True: res = np.sort(res,axis=0) w = np.nonzero(np.diff(res,axis=0)==0) nr = len(w[0]) if nr==0: break res[w] = np.random.random_integers(m,size=nr) if repeats is None: return res[:,0] else: return res For really large values of repeats it does too much sorting; I didn't have the energy to make it pull all the ones with repeats to the beginning so that only they need to be re-sorted the next time through. Still, the expected number of trips through the while loop grows only logarithmically with repeats, so it shouldn't be too bad. ''' Anne On 20 December 2010 12:13, John Salvatier <jsalv...@u.washington.edu> wrote: > I think this is not possible to do efficiently with just numpy. If you want > to do this efficiently, I wrote a no-replacement sampler in Cython some time > ago (below). I hearby release it to the public domain. > > ''' > > Created on Oct 24, 2009 > http://stackoverflow.com/questions/311703/algorithm-for-sampling-without-replacement > @author: johnsalvatier > > ''' > > from __future__ import division > > import numpy > > def random_no_replace(sampleSize, populationSize, numSamples): > > > > samples = numpy.zeros((numSamples, sampleSize),dtype=int) > > > > # Use Knuth's variable names > > cdef int n = sampleSize > > cdef int N = populationSize > > cdef i = 0 > > cdef int t = 0 # total input records dealt with > > cdef int m = 0 # number of items selected so far > > cdef double u > > while i < numSamples: > > t = 0 > > m = 0 > > while m < n : > > > > u = numpy.random.uniform() # call a uniform(0,1) random number > generator > > if (N - t)*u >= n - m : > > > > t += 1 > > > > else: > > > > samples[i,m] = t > > t += 1 > > m += 1 > > > > i += 1 > > > > return samples > > > > On Mon, Dec 20, 2010 at 8:28 AM, Alan G Isaac <alan.is...@gmail.com> wrote: >> >> I want to sample *without* replacement from a vector >> (as with Python's random.sample). I don't see a direct >> replacement for this, and I don't want to carry two >> PRNG's around. Is the best way something like this? >> >> permutation(myvector)[:samplesize] >> >> Thanks, >> Alan Isaac >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion