On 27/08/13 08:16, monarch_dodra wrote:
What bothers me about these is that they have a fixed seed value,
which (IMO) is even worst than default seeding to
"unpredicatableSeed"

C's "rand" also "defaults seed". I've "tutored" C++ on codeguru
for years, and years in and years out, I'd get asked "why does my
random program keep creating the same output". This behavior
surprises people.

A PRNG should either error-out on no-seed, or be run-time
unpredictably seeded. The middle ground just gives you the worst
of both...

For what it's worth, I think that your experience is a consequence of the particular case of C. The "easy" way to generate a random number in C is via some variation on rand() / RAND_MAX , so that's what people do, and so they get hit by the default seed.

On the other hand with D the default random number generator is rndGen, which is unpredictably seeded, and which is used if no other RNG is specified -- so _default_ random behaviour in D mimics that in interpreted languages and is different per program run. (Actually, I sometimes find myself seeding rndGen in order to get _the same_ results, which of course sometimes you do want:-)

Personally I think there may be some value in having an accepted default configuration for RNGs, so long as it's correctly signposted what will happen, and so long as the "easy" thing to do is not going to fall into the trap you described.

However, I think initialization is an important issue not just for RNGs but for the diversity of other entities that use them. This also impacts on the class/struct discussion.

For example, with RNGs per se it's pretty trivial (with a class-based approach) that we have a constructor requiring a seed, and in addition (possibly) a default constructor that seeds with some default condition (let's leave aside preferences on this for now:-).

The latter default-seed approach cannot be implemented with structs -- you can't have a no-parameter constructor -- so struct-based RNGs have to work round this in one of two ways: either by having default settings for all the internal values of the RNG state data (which e.g. Xorshift can do because it's a small total number of parameters), or by having conditionals which get triggered when front(), popFront() etc. are called, as in the Mersenne Twister implementation, which calls seed() if the value of the internal parameter mti is equal to size_t.max.

The latter approach is very annoying because it means that e.g. front() cannot be const, which we'd like it to be.

So, all of that makes final classes a nice approach, albeit we might compromise for other reasons. But it's not so simple for other random functions. Now consider random sampling, and the following 3 versions of code:

    {   // 1
        auto gen = Random(unpredictableSeed);
        auto sample = randomSample(iota(100), 10, gen);
        writeln(sample);
        writeln(uniform(0.0, 1.0, gen));
    }

    {   // 2
        auto gen = Random(unpredictableSeed);
        auto sample = randomSample(iota(100), 10, gen);
        writeln(uniform(0.0, 1.0, gen));
        writeln(sample);
    }

    {   // 3
        auto gen = Random(unpredictableSeed);
        writeln(uniform(0.0, 1.0, gen));
        auto sample = randomSample(iota(100), 10, gen);
        writeln(sample);
    }

What would you _expect_ to happen to the values of the random number and the random sample in each case?

The most intuitive thing would be that the values of the sample are determined at the point it's created, so they would depend on where the line sample = ... occurs and on nothing else. However, we know that random samples are lazily evaluated.

So, perhaps the second logical alternative is that its values should be determined _when it is read_, i.e. at the point where we writeln(sample).

Now consider -- when is the _first_ "front" value of the sample set? If it's set at construction time (which would be normal for a range), the remaining values in the sample will depend on what calls to the RNG are put in in-between construction and reading. So, in this case, the samples from programs 1 and 2 will have the same _first_ value, but different subsequent values.

That, to me, is both unintuitive and undesirable.

Now it gets more complicated. In the case of RandomSample, we have a bunch of different public functions we can call. When front() is called, we need to check if the sample has been initialized before we return a value. Similarly, popFront() should behave differently depending on whether the first sample value has been determined yet. Then there is a method index() that returns the numerical index of the sampled value (so, if you sample from 0, 1, 2, .... you should have sample.front == sample.index always), and this too clearly depends on the sample being initialized before it can return.

None of this can be solved with a struct vs. class difference, assuming we agree that it's inherently undesirable to initialize at construction time.

This problem is potentially a rich source of bugs, as we saw with RandomCover in our recent pull request discussion. Relying on manual solutions seems undesirable -- you will get people (as I did) fixing front and popFront(), but forgetting about a function like RandomSample.index(). (Fix in the works:-)

So, I'm wondering if there is a potential for a syntax sugar like invariant(), but which will not get kicked out when compiled with -release and which will enable essential checking of the internal state of the system whenever a public method is called.

I'd suggest there actually be two, one to check entry condition, one exit.

Reply via email to