There are time/space tradeoffs in sampling without replacement. The version in
monte-carlo takes space O(n) and time O(k), for all k. I chose this algorithm
instead of a streaming algorithm, with takes space O(k) and time O(n).
If k is much less than n, you can improve a bit. Here is a version taking
space O(k) and time O(k^2), *provided k << n*; when k is large relative to n,
the time increases to roughly O(n log n).
import Control.Monad( liftM )
import Control.Monad.MC
import Data.List( nub )
sampleSmallIntSubset :: MonadMC m => Int -> Int -> m [Int]
sampleSmallIntSubset n k = (take k . nub) `liftM` repeatMC (sampleInt n)
sampleSmallSubset :: MonadMC m => [a] -> Int -> m [a]
sampleSmallSubset xs k = (map (xs !!)) `liftM` sampleSmallIntSubset (length xs)
k
-- strict version of the above
sampleSmallSubset' :: MonadMC m => [a] -> Int -> m [a]
sampleSmallSubset' xs k = do
ys <- sampleSmallSubset xs k
(length ys) `seq` return ys -- by forcing the length, we make sure we are
done with the RNG when we return
If you're wondering about the strict version, the following example should be
instructive:
Let's note the first three numbers generated by the RNG with seed 0:
ghci> (replicateM 3 $ uniform 0 1) `evalMC` mt19937 0
[0.999741748906672,0.16290987539105117,0.28261780529282987]
Now, let's look at the first number we get after a call to sampleSmallSubset,
if we don't evaluate the subset:
ghci> sampleSmallSubset [1..10000] 2 >> uniform 0 1) `evalMC` mt19937 0
0.999741748906672
Apparently, the call to sampleSmallSubset doesn't consume any random numbers.
Now, let's try the strict version:
ghci> (sampleSmallSubset' [1..10000] 2 >> uniform 0 1) `evalMC` mt19937 0
0.28261780529282987
This time, the call to sampleSmallSubset' consumes two random numbers, even if
we throw away the result.
Since you're just starting out, I recommend using the strict versions of the
functions. They are less space-efficient than the lazy versions, but they are
guaranteed to be safe.
> From: Jonas Almström Duregård <jonas.duregard at chalmers.se>
> Subject: Re: [Haskell-cafe] Acquiring a random set of a specific size (w/o
> dups) from a range of Ints
> To: "michael rice" <
> nowgate at yahoo.com
> >
> Cc: "Felipe Almeida Lessa" <
> felipe.lessa at gmail.com>, haskell-cafe at haskell.org
>
> Date: Tuesday, June 14, 2011, 5:17 AM
>
> >
> Shuffle [1..20], then take 5?
>
> >
> Yes, so simple, I'm embarrassed I didn't think of it.
>
>
> That works well for small numbers, but I'm guessing it will evaluate the
> entire list so it should not be used for large inputs. If you have a large
> interval and use a relatively small part of it, the following function should
> be significantly faster (it builds a random permutation lazily):
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe