On Tue, Dec 15, 2020 at 1:00 AM Robert Kern <robert.k...@gmail.com> wrote: > > On Mon, Dec 14, 2020 at 3:27 PM Evgeni Burovski <evgeny.burovs...@gmail.com> > wrote: >> >> <snip> >> >> > I also think that the lock only matters for Multithreaded code not >> > Multiprocess. I believe the latter pickles and unpickles any Generator >> > object (and the underying BitGenerator) and so each process has its own >> > version. Note that when multiprocessing the recommended procedure is to >> > use spawn() to generate a sequence of BitGenerators and to use a distinct >> > BitGenerator in each process. If you do this then you are free from the >> > lock. >> >> Thanks. Just to confirm: does using SeedSequence spawn_key arg >> generate distinct BitGenerators? As in >> >> cdef class Wrapper(): >> def __init__(self, seed): >> entropy, num = seed >> py_gen = PCG64(SeedSequence(entropy, spawn_key=(spawn_key,))) >> self.rng = <bitgen_t *> >> py_gen.capsule.PyCapsule_GetPointer(capsule, "BitGenerator") # <--- >> this >> >> cdef Wrapper rng_0 = Wrapper(seed=(123, 0)) >> cdef Wrapper rng_1 = Wrapper(seed=(123, 1)) >> >> And then,of these two objects, do they have distinct BitGenerators? > > > The code you wrote doesn't work (`spawn_key` is never assigned). I can guess > what you meant to write, though, and yes, you would get distinct > `BitGenerator`s. However, I do not recommend using `spawn_key` explicitly. > The `SeedSequence.spawn()` method internally keeps track of how many children > it has spawned and uses that to construct the `spawn_key`s for its subsequent > children. If you play around with making your own `spawn_key`s, then the > parent `SeedSequence(entropy)` might spawn identical `SeedSequence`s to the > ones you constructed. > > If you don't want to use the `spawn()` API to construct the separate > `SeedSequence`s but still want to incorporate some per-process information > into the seeds (e.g. the 0 and 1 in your example), then note that a tuple of > integers is a valid value for the `entropy` argument. You can have the first > item be the same (i.e. per-run information) and the second item be a > per-process ID or counter. > > cdef class Wrapper(): > def __init__(self, seed): > py_gen = PCG64(SeedSequence(seed)) > self.rng = <bitgen_t *>py_gen.capsule.PyCapsule_GetPointer(capsule, > "BitGenerator") > > cdef Wrapper rng_0 = Wrapper(seed=(123, 0)) > cdef Wrapper rng_1 = Wrapper(seed=(123, 1))
Thanks Robert! I indeed typo'd the spawn_key, and indeed the intention is exactly to include a worker_id into a seed to make sure each worker gets a separate stream. The use of the spawn_key was --- as I now finally realize --- a misunderstanding of your and Kevin's previous replies in https://mail.python.org/pipermail/numpy-discussion/2020-July/080833.html So I'm moving my project to use the `SeedSequence((base_seed, worker_id))` API --- thanks! Just as a side note, this is not very prominent in the docs, and I'm ready to volunteer to send a doc PR --- I'm only not sure which part of the docs, and would appreciate a pointer. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion