On Tue, Dec 15, 2020 at 1:00 AM Robert Kern <robert.k...@gmail.com> wrote:
>
> On Mon, Dec 14, 2020 at 3:27 PM Evgeni Burovski <evgeny.burovs...@gmail.com> 
> wrote:
>>
>> <snip>
>>
>> > I also think that the lock only matters for Multithreaded code not 
>> > Multiprocess.  I believe the latter pickles and unpickles any Generator 
>> > object (and the underying BitGenerator) and so each process has its own 
>> > version.  Note that when multiprocessing the recommended procedure is to 
>> > use spawn() to generate a sequence of BitGenerators and to use a distinct 
>> > BitGenerator in each process. If you do this then you are free from the 
>> > lock.
>>
>> Thanks. Just to confirm: does using SeedSequence spawn_key arg
>> generate distinct BitGenerators? As in
>>
>> cdef class Wrapper():
>>     def __init__(self, seed):
>>         entropy, num = seed
>>         py_gen = PCG64(SeedSequence(entropy, spawn_key=(spawn_key,)))
>>         self.rng = <bitgen_t *>
>> py_gen.capsule.PyCapsule_GetPointer(capsule, "BitGenerator")    # <---
>> this
>>
>> cdef Wrapper rng_0 = Wrapper(seed=(123, 0))
>> cdef Wrapper rng_1 = Wrapper(seed=(123, 1))
>>
>> And then,of these two objects, do they have distinct BitGenerators?
>
>
> The code you wrote doesn't work (`spawn_key` is never assigned). I can guess 
> what you meant to write, though, and yes, you would get distinct 
> `BitGenerator`s. However, I do not recommend using `spawn_key` explicitly. 
> The `SeedSequence.spawn()` method internally keeps track of how many children 
> it has spawned and uses that to construct the `spawn_key`s for its subsequent 
> children. If you play around with making your own `spawn_key`s, then the 
> parent `SeedSequence(entropy)` might spawn identical `SeedSequence`s to the 
> ones you constructed.
>
> If you don't want to use the `spawn()` API to construct the separate 
> `SeedSequence`s but still want to incorporate some per-process information 
> into the seeds (e.g. the 0 and 1 in your example), then note that a tuple of 
> integers is a valid value for the `entropy` argument. You can have the first 
> item be the same (i.e. per-run information) and the second item be a 
> per-process ID or counter.
>
> cdef class Wrapper():
>     def __init__(self, seed):
>         py_gen = PCG64(SeedSequence(seed))
>         self.rng = <bitgen_t *>py_gen.capsule.PyCapsule_GetPointer(capsule, 
> "BitGenerator")
>
> cdef Wrapper rng_0 = Wrapper(seed=(123, 0))
> cdef Wrapper rng_1 = Wrapper(seed=(123, 1))


Thanks Robert!

I indeed typo'd the spawn_key, and indeed the intention is exactly to
include a worker_id into a seed to make sure each worker gets a
separate stream.

The use of the spawn_key was --- as I now finally realize --- a
misunderstanding of your and Kevin's previous replies in
https://mail.python.org/pipermail/numpy-discussion/2020-July/080833.html

So I'm moving my project to use the `SeedSequence((base_seed,
worker_id))` API --- thanks!

Just as a side note, this is not very prominent in the docs, and I'm
ready to volunteer to send a doc PR --- I'm only not sure which part
of the docs, and would appreciate a pointer.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to