Correction: the version in monte-carlo takes time O(n), but it only consumes k random numbers. The streaming algorithm consumes n random numbers.
On Jun 14, 2011, at 12:04 PM, Patrick Perry wrote: > There are time/space tradeoffs in sampling without replacement. The version > in monte-carlo takes space O(n) and time O(k), for all k. I chose this > algorithm instead of a streaming algorithm, with takes space O(k) and time > O(n). > > If k is much less than n, you can improve a bit. Here is a version taking > space O(k) and time O(k^2), *provided k << n*; when k is large relative to > n, the time increases to roughly O(n log n). > > import Control.Monad( liftM ) > import Control.Monad.MC > import Data.List( nub ) > > sampleSmallIntSubset :: MonadMC m => Int -> Int -> m [Int] > sampleSmallIntSubset n k = (take k . nub) `liftM` repeatMC (sampleInt n) > > sampleSmallSubset :: MonadMC m => [a] -> Int -> m [a] > sampleSmallSubset xs k = (map (xs !!)) `liftM` sampleSmallIntSubset (length > xs) k > > -- strict version of the above > sampleSmallSubset' :: MonadMC m => [a] -> Int -> m [a] > sampleSmallSubset' xs k = do > ys <- sampleSmallSubset xs k > (length ys) `seq` return ys -- by forcing the length, we make sure we are > done with the RNG when we return > > > If you're wondering about the strict version, the following example should be > instructive: > > Let's note the first three numbers generated by the RNG with seed 0: > > ghci> (replicateM 3 $ uniform 0 1) `evalMC` mt19937 0 > [0.999741748906672,0.16290987539105117,0.28261780529282987] > > Now, let's look at the first number we get after a call to sampleSmallSubset, > if we don't evaluate the subset: > > ghci> sampleSmallSubset [1..10000] 2 >> uniform 0 1) `evalMC` mt19937 0 > 0.999741748906672 > > Apparently, the call to sampleSmallSubset doesn't consume any random numbers. > Now, let's try the strict version: > > ghci> (sampleSmallSubset' [1..10000] 2 >> uniform 0 1) `evalMC` mt19937 0 > 0.28261780529282987 > > This time, the call to sampleSmallSubset' consumes two random numbers, even > if we throw away the result. > > Since you're just starting out, I recommend using the strict versions of the > functions. They are less space-efficient than the lazy versions, but they > are guaranteed to be safe. > > > > >> From: Jonas Almström Duregård <jonas.duregard at chalmers.se> > >> Subject: Re: [Haskell-cafe] Acquiring a random set of a specific size (w/o >> dups) from a range of Ints >> To: "michael rice" < >> nowgate at yahoo.com >>> >> Cc: "Felipe Almeida Lessa" < >> felipe.lessa at gmail.com>, haskell-cafe at haskell.org >> >> Date: Tuesday, June 14, 2011, 5:17 AM >> >>> >> Shuffle [1..20], then take 5? >> >>> >> Yes, so simple, I'm embarrassed I didn't think of it. >> >> >> That works well for small numbers, but I'm guessing it will evaluate the >> entire list so it should not be used for large inputs. If you have a large >> interval and use a relatively small part of it, the following function >> should be significantly faster (it builds a random permutation lazily): _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
