On Monday, 3 June 2013 at 13:18:30 UTC, Joseph Rushton Wakeling wrote:
On 06/03/2013 02:30 PM, Joseph Rushton Wakeling wrote:
On 06/03/2013 01:29 PM, Diggory wrote:
For small samples from very large ranges an efficient algorithm would be:

int[] randomGen(int N, int M) {
    if (N == 0)
        return [];

    int[] result = randomGen(N-1, M-1);

    int num = rand(M);
    foreach (ref int i; result)
        if (i >= num)
            ++i;

    result ~= num;

    return result;
}

Are you sure about the efficiency of that algorithm? Looks like it's be O(M) to me.

Apologies, I misread the algorithm slightly. Your qualifier is quite correct -- when the number of sample points is very small (e.g. 5 out of some arbitrary very large number), that algorithm is very quick indeed (I ran some tests as I
was curious:-), and can outperform D's randomSample.

It doesn't scale with the size of the input (your value M), but with the number of sample points n, but that scaling is very bad -- so as the sample size grows
it becomes _much_ worse than randomSample.

Do you have a reference for the algorithm? Off the top of my head I don't
recognize it from any of the texts I've read.

Thanks for testing before dismissing completely :P The way it returns results can be improved a lot by pre-allocating a range of the necessary size/using a range passed in.

The complexity is O(N²) where N is the number of samples out of M. The algorithm is something I just thought of although I've possibly used it before and I'm sure someone else has thought of it too. It's quite simple, it effectively says "pick a value from the range except for this value" recursively but instead of dividing the range in two, it shifts the top part down first to make a contiguous range and then shifts the results which are past the split up one afterward.

I expect the phobos implementation to be something like O(M) in which case my method would be faster whenever N < sqrt(M) (with the optimisations)

Reply via email to