On 03-07-14 03:43, Vaclav Petras wrote:

On Wed, Jul 2, 2014 at 8:15 PM, Glynn Clements <gl...@gclements.plus.com <mailto:gl...@gclements.plus.com>> wrote:

    > Shouldn't the seed not be generated on e.g, OS time,
    > which would ensure that each run would give a different result?

    No. The reason is to provide reproducibility. Anyone running the same
    command with the same data should obtain the same result.

It is certainly be good to be able to reproduce commands. However, I think in most (statistical) software the default / expected behaviour is to have a new automatically generated seed at each run. In R for example, if you have to explicitly specify the seed using the function set.seed(). I would think therefore what most users will expect a similar behaviour in GRASS. It would certainly be my personal preference to have the option to set the seed explicitly if you want reproducibility, but have it generated automatically otherwise. But that is just a personal preference.


Does the reproducibility go behind one operating system, compiler or library? I don't think that the first random number is specified by the C language standard. If the results would be really reproducible it would be good for testing framework but I'm afraid that they are not (with my limited knowledge about the topic).

    If you want a different result each time, set GRASS_RND_SEED to a
    different value each time, e.g.

            GRASS_RND_SEED=`date +%N` r.mapcalc "a = rand(0,100)"

    [%N is the nanoseconds portion of the current time; this is a GNU
    extension.]

Perhaps this can be explained like this in the manual page? A far better option would be to provide this as a normal parameter so it can be set from the gui interface or command line like any other variable.

I've heard that this is not enough on powerful computers/clusters, that you have to use also PID because nanoseconds might be the same (I think I rememberer that it was nanoseconds not seconds).


    > On a related note, it would be nice to be able to set the seed
    (I think
    > there has been such a request before, but not sure about the
    answer at that
    > time).

    GRASS_RND_SEED was the answer.


I think there should be some possibility of randomization (auto-setting of seed) build-in the modules providing random(ized) results. Perhaps a flag which would turn it on. It can be also an option which would behave like GRASS_RND_SEED but would have one special value for auto-generating the seed. (GRASS_RND_SEED if present would override this option.) With the default value of the option we should ask a question what is actually the expected behavior of the module giving random results.
Yes, that would be great. As for the default value, see my earlier argument.

This would provide a nicer interface in Python, standard interface in command line, and possibility to set it in the GUI (which means possibility to set it for users which don't use command line.) Moreover, it would provide all users with the way of setting the random seen in the manner which we consider the best according to our knowledge.
Agree. The way to set the seed now may not be understood by everybody and with all the work going into streamlining the GUI, this kind of fairly important options should also be available through the GUI

Vaclav

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Reply via email to