Correction: the version in monte-carlo takes time O(n), but it only consumes k 
random numbers.  The streaming algorithm consumes n random numbers.

On Jun 14, 2011, at 12:04 PM, Patrick Perry wrote:

> There are time/space tradeoffs in sampling without replacement.  The version 
> in monte-carlo takes space O(n) and time O(k), for all k.  I chose this 
> algorithm instead of a streaming algorithm, with takes space O(k) and time 
> O(n).
> 
> If k is much less than n, you can improve a bit.  Here is a version taking 
> space O(k) and time O(k^2), *provided k << n*;  when k is large relative to 
> n, the time increases to roughly O(n log n).  
> 
> import Control.Monad( liftM )
> import Control.Monad.MC
> import Data.List( nub )
> 
> sampleSmallIntSubset :: MonadMC m => Int -> Int -> m [Int]
> sampleSmallIntSubset n k = (take k . nub) `liftM` repeatMC (sampleInt n)
> 
> sampleSmallSubset :: MonadMC m => [a] -> Int -> m [a]
> sampleSmallSubset xs k = (map (xs !!)) `liftM` sampleSmallIntSubset (length 
> xs) k
> 
> -- strict version of the above
> sampleSmallSubset' :: MonadMC m => [a] -> Int -> m [a]
> sampleSmallSubset' xs k = do
>    ys <- sampleSmallSubset xs k
>    (length ys) `seq` return ys -- by forcing the length, we make sure we are 
> done with the RNG when we return
> 
> 
> If you're wondering about the strict version, the following example should be 
> instructive:
> 
> Let's note the first three numbers generated by the RNG with seed 0:
> 
>    ghci> (replicateM 3 $ uniform 0 1) `evalMC` mt19937 0
>    [0.999741748906672,0.16290987539105117,0.28261780529282987]
> 
> Now, let's look at the first number we get after a call to sampleSmallSubset, 
> if we don't evaluate the subset:
> 
>    ghci> sampleSmallSubset [1..10000] 2 >> uniform 0 1) `evalMC` mt19937 0
>    0.999741748906672
> 
> Apparently, the call to sampleSmallSubset doesn't consume any random numbers. 
>  Now, let's try the strict version:
> 
>    ghci> (sampleSmallSubset' [1..10000] 2 >> uniform 0 1) `evalMC` mt19937 0
>    0.28261780529282987
> 
> This time, the call to sampleSmallSubset' consumes two random numbers, even 
> if we throw away the result.
> 
> Since you're just starting out, I recommend using the strict versions of the 
> functions.  They are less space-efficient than the lazy versions, but they 
> are guaranteed to be safe.
> 
> 
> 
> 
>> From: Jonas Almström Duregård <jonas.duregard at chalmers.se>
> 
>> Subject: Re: [Haskell-cafe] Acquiring a random set of a specific size (w/o 
>> dups) from a range of Ints
>> To: "michael rice" <
>> nowgate at yahoo.com
>>> 
>> Cc: "Felipe Almeida Lessa" <
>> felipe.lessa at gmail.com>, haskell-cafe at haskell.org
>> 
>> Date: Tuesday, June 14, 2011, 5:17 AM
>> 
>>> 
>> Shuffle [1..20], then take 5?
>> 
>>> 
>> Yes, so simple, I'm embarrassed I didn't think of it.
>> 
>> 
>> That works well for small numbers, but I'm guessing it will evaluate the 
>> entire list so it should not be used for large inputs. If you have a large 
>> interval and use a relatively small part of it, the following function 
>> should be significantly faster (it builds a random permutation lazily):


_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to