On Monday, 3 June 2013 at 21:24:50 UTC, Joseph Rushton Wakeling wrote:
On 06/03/2013 08:28 PM, Diggory wrote:
I'd guess that the heavy use of floating point arithmetic to calculate the step sizes means that algorithm has a fairly large constant overhead even though the
complexity is smaller.

Yes, I agree. There might be some optimizations that could be done there.

Well I've come up with this new algorithm:


uint[] randomSample(uint N, uint M) {
        uint[] result = new uint[N];

        struct hashPair {
                uint key;
                uint index;
        }

        size_t tableSize = N*4;
        if (tableSize > M)
                tableSize = M;

        hashPair[] table = new hashPair[tableSize];

        for (uint i = 0; i < N; ++i) {
                uint v = (rndGen.front % (M-i))+i;
                uint newv = v;
                rndGen.popFront();

                uint vhash = v%tableSize;

                while (table[vhash].index) {
                        if (table[vhash].key == v) {
                                newv = table[vhash].index-1;
                                table[vhash].index = i+1;
                                goto done;
                        }

                        vhash = (vhash+1)%tableSize;
                }

                table[vhash].key = v;
                table[vhash].index = i+1;

done:
                result[i] = newv;
        }

        return result;
}


It's O(N) rather than O(N²). Conceptually it works by randomly shuffling the first N items in an array of size M which takes O(N) time but requires an array of size O(M), so to avoid this it uses a simple hash table to store just the items which have been swapped. Since only O(N) items are being shuffled there can only be O(N) swaps and so the hash table can be O(N) in size while still offering constant time look-up.

Reply via email to