Re: [DISCUSS] Seeding and determinism on multi-gpu systems.

Chris Olivier Tue, 09 Jan 2018 08:49:16 -0800

This is what I was asking about:
https://www.unix.com/man-page/POSIX/3posix/random/


POSIX standard stating that given a seed, the output must be deterministic.

On Tue, Jan 9, 2018 at 7:58 AM, kellen sunderland <
[email protected]> wrote:

> Sorry if I'm misunderstanding your question here Chris.
>
> On Tue, Jan 9, 2018 at 4:58 PM, kellen sunderland <
> [email protected]> wrote:
>
> > I think the convention is that random generators in most modern languages
> > are always seeded, and always deterministic.  If a user seed isn't
> > supplied, implementations generally provide their own seed, which they
> > attempt to make unique.  Often they generate a seed that takes into
> account
> > the current time.  This is at least the case for many mainstream
> languages.
> >
> > Java implementation: https://docs.oracle.com/javase/8/docs/api/
> > java/util/Random.html
> > Remarks: "If two instances of Random are created with the same seed, and
> > the same sequence of method calls is made for each, they will generate
> and
> > return identical sequences of numbers."
> >
> > C#: https://msdn.microsoft.com/en-us/library/ctssatww(v=vs.110).aspx
> > Remarks: "Providing an identical seed value to different Random objects
> > causes each instance to produce identical sequences of random numbers.
> This
> > is often done when testing apps that rely on random number generators."
> >
> > On Tue, Jan 9, 2018 at 4:27 PM, Chris Olivier <[email protected]>
> > wrote:
> >
> >> wait wait — i don’t think that random number generators should return
> >> deterministic lists of numbers. i’m asking if something says it’s
> supposed
> >> to. i know they tend to, but my understanding is that they tend to
> because
> >> of the challenge of generating true random numbers from hardware.  IMHO
> >> the
> >> ideal random number generator would not return a determinaiticnset if
> >> numbers regardless of seed.
> >>
> >> On Tue, Jan 9, 2018 at 3:43 AM Pedro Larroy <
> [email protected]
> >> >
> >> wrote:
> >>
> >> > For enabling parallel deterministic testing we can set an environment
> >> > variable and set the same seed on different devices for those cases
> >> > where we want it, leaving the default as it is. I think this would be
> >> > an easy solution that wouldn't change any behaviour in training on
> >> > multi-gpu.
> >> >
> >> > On Tue, Jan 9, 2018 at 10:48 AM, kellen sunderland
> >> > <[email protected]> wrote:
> >> > > Thanks Asmus, yes this is also the approach I would be in favour of.
> >> I
> >> > > think we should optionally allow the user to specify if they want
> >> > > deterministic behaviour independent of the GPU they run on.  If
> MXNet
> >> is
> >> > > going to support more arbitrary linear algabra operations I could
> see
> >> a
> >> > lot
> >> > > of use cases for this.  For example I want deterministic noise fed
> >> into a
> >> > > deep-RL simulation so that I can compare a few different algorithms
> >> > without
> >> > > variance, and do it in parallel on my machine (that happens to have
> >> two
> >> > > GPUs).
> >> > >
> >> > > On Tue, Jan 9, 2018 at 10:36 AM, Asmus Hetzel
> >> > <[email protected]>
> >> > > wrote:
> >> > >
> >> > >>  The issue is tricky. Number generators should return deterministic
> >> sets
> >> > >> of numbers as Chris said, but that usually only applies to
> >> > non-distributed
> >> > >> systems. And to some extend, we have already a distributed system
> as
> >> > soon
> >> > >> as one cpu and one gpu is involved.
> >> > >> For the usual setup like distributed training, using different
> seeds
> >> on
> >> > >> different devices is a must. You distribute a process that involves
> >> > random
> >> > >> number generation and that means that you absolutely have to ensure
> >> that
> >> > >> the sequences on the devices do not correlate. So this behaviour is
> >> > >> intended and correct. We also can not guarantee that random number
> >> > >> generation is deterministic when running on CPU vs. running on GPU.
> >> > >> So what we are dealing here is generating repeatable results, when
> >> the
> >> > >> application/code section is running on a single GPU out of a bigger
> >> set
> >> > of
> >> > >> available GPUs, but we do not have control on which one. The
> crucial
> >> > line
> >> > >> in mxnet is this one (resource.cc):
> >> > >>
> >> > >> const uint32_t seed = ctx.dev_id + i * kMaxNumGPUs + global_seed *
> >> > >> kRandMagic;
> >> > >> Here I think it would make sense to add a switch that optionally
> >> makes
> >> > >> this setting independent of ctx.dev_id. But we would have to
> document
> >> > >> really well that this is solely meant for specific types of
> >> > debugging/unit
> >> > >> testing.
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>     Am Montag, 8. Januar 2018, 19:30:02 MEZ hat Chris Olivier <
> >> > >> [email protected]> Folgendes geschrieben:
> >> > >>
> >> > >>  Is it explicitly defined somewhere that random number generators
> >> should
> >> > >> always return a deterministic set of numbers given the same seed,
> or
> >> is
> >> > >> that just a side-effect of some hardware not having a better way to
> >> > >> generate random numbers so they use a user-defined seed to kick off
> >> the
> >> > >> randomization starting point?
> >> > >>
> >> > >> On Mon, Jan 8, 2018 at 9:27 AM, kellen sunderland <
> >> > >> [email protected]> wrote:
> >> > >>
> >> > >> > Hello MXNet devs,
> >> > >> >
> >> > >> > I wanted to see what people thought about the follow section of
> >> code,
> >> > >> which
> >> > >> > I think has some subtle pros/cons:
> >> > >> > https://github.com/apache/incubator-mxnet/blob/
> >> > >> > d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/resource.cc#L188
> >> > >> >
> >> > >> > Tobi (tdomhan) from sockeye pointed it out to me after he spent
> >> some
> >> > time
> >> > >> > debugging non-determinism in his model training.
> >> > >> >
> >> > >> > This functionality is well documented here:
> >> > >> > https://mxnet.incubator.apache.org/api/python/ndarray.
> >> > >> > html#mxnet.random.seed
> >> > >> > but I don't think the current api meets all use cases due to this
> >> > >> section:
> >> > >> >
> >> > >> > "Random number generators in MXNet are device specific.
> Therefore,
> >> > random
> >> > >> > numbers generated from two devices can be different even if they
> >> are
> >> > >> seeded
> >> > >> > using the same seed."
> >> > >> >
> >> > >> > I'm guessing this is a feature that makes distributed training
> >> easier
> >> > in
> >> > >> > MXNet, you wouldn't want to train the same model on each GPU.
> >> However
> >> > >> the
> >> > >> > downside of this is that if you run unit tests on a multi-gpu
> >> system,
> >> > or
> >> > >> in
> >> > >> > a training environment where you don't have control over which
> GPU
> >> you
> >> > >> use,
> >> > >> > you can't count on deterministic behaviour which you can assert
> >> > results
> >> > >> > against.  I have a feeling there are non-unit test use cases
> where
> >> > you'd
> >> > >> > also want deterministic behaviour independent of which gpu you
> >> happen
> >> > to
> >> > >> > have your code scheduled to run on.
> >> > >> >
> >> > >> > How do others feel about this?  Would it make sense to have some
> >> > optional
> >> > >> > args in the seed call to have the seed-per-device functionality
> >> turned
> >> > >> off?
> >> > >> >
> >> > >> > -Kellen
> >> > >> >
> >> > >>
> >> > >>
> >> >
> >>
> >
> >
>

Re: [DISCUSS] Seeding and determinism on multi-gpu systems.

Reply via email to