Thank you Robert! This scheme fits perfectly into what I`m trying to accomplish! :) The "smooshing" of ints by supplying a list of ints had eluded me. Thank you also for the pointer about built-in hash(). I would not be able to rely on it anyways, because it does not return strictly positive ints which SeedSequence requires. If you have a minute to spare: Could you briefly explain "int(joblib.hash(key) <https://joblib.readthedocs.io/en/latest/generated/joblib.hash.html>, 16)" , and would this always return non-negative integers? Thanks again!
tor. 26. aug. 2021 kl. 22:59 skrev Robert Kern <robert.k...@gmail.com>: > On Thu, Aug 26, 2021 at 2:22 PM Stig Korsnes <stigkors...@gmail.com> > wrote: > >> Hi, >> Is there a way to uniquely spawn child seeds? >> I`m doing monte carlo analysis, where I have n random processes, each >> with their own generator. >> All process models instantiate a generator with default_rng(). I.e >> ss=SeedSequence() cs=ss.Spawn(n), and using cs[i] for process i. Now, the >> problem I`m facing, is that results using individual process depends on >> the order of the process initialization ,and the number of processes used. >> However, if I could spawn children with a unique identifier, I would be >> able to reproduce my individual results without having to pickle/log >> states. For example, all my models have an id (tuple) field which is >> hashable. >> If I had the ability to SeedSequence(x).Spawn([objects]) where objects >> support hash(object), I would have reproducibility for all my processes. I >> could do without the spawning, but then I would probably loose independence >> when I do multiproc? Is there a way to achieve my goal in the current >> version 1.21 of numpy? >> > > I would probably not rely on `hash()` as it is only intended to be pretty > good at getting distinct values from distinct inputs. If you can combine > the tuple objects into a string of bytes in a reliable, collision-free way > and use one of the cryptographic hashes to get them down to a 128bit > number, that'd be ideal. `int(joblib.hash(key) > <https://joblib.readthedocs.io/en/latest/generated/joblib.hash.html>, > 16)` should do nicely. You can combine that with your main process's seed > easily. SeedSequence can take arbitrary amounts of integer data and smoosh > them all together. The spawning functionality builds off of that, but you > can also just manually pass in lists of integers. > > Let's call that function `stronghash()`. Let's call your main process seed > number `seed` (this is the thing that the user can set on the command-line > or something you get from `secrets.randbits(128)` if you need a fresh one). > Let's call the unique tuple `key`. You can build the `SeedSequence` for > each job according to the `key` like so: > > root_ss = SeedSequence(seed) > for key, data in jobs: > child_ss = SeedSequence([stronghash(key), seed]) > submit_job(key, data, seed=child_ss) > > Now each job will get its own unique stream regardless of the order the > job is assigned. When the user reruns it with the same root `seed`, they > will get the same results. When the user chooses a different `seed`, they > will get another set of results (this is why you don't want to just use > `SeedSequence(stronghash(key))` all by itself). > > I put the job-specific seed data ahead of the main program's seed to be on > the super-safe side. The spawning mechanism will append integers to the > end, so there's a super-tiny chance somewhere down a long line of > `root_ss.spawn()`s that there would be a collision (and I mean > super-extra-tiny). But best practices cost nothing. > > I hope that helps and is not too confusing! > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion