Re: [Numpy-discussion] SeedSequence.spawn()

Stig Korsnes Sun, 29 Aug 2021 03:58:11 -0700

And big kudos for building AND shepherding :)

søn. 29. aug. 2021 kl. 12:56 skrev Stig Korsnes <stigkors...@gmail.com>:


> Thanks again Robert!
> Got rid of dict(state).
>
> Not sure I followed you completely on the test case. The "calculator" i am
> writing , will for the specific use case depend on ~200-1000 processes.
> Each process object will return say 1m floats when its method scenario is
> called. If I am not mistaken, that would require 7-8GiB just to keep the
> these in memory. Furthermore I would possibly have to add the size of the
> dependent calculation on these (but would likely aggregate outside of
> testing).  A given object that depends on processes will calculate its
> results based on 1-4 (1-4 *1m  of these processes (non multiproc)), and
> will loop over objects with processpool. So my reasoning is that running
> memory consumption would then be (1-4)*size of 1m floats x processes + all
> of other overhead. Since sampling 1m normals is pretty fast, I can happily
> live with sampling (vs lookup in presampled array), but since two object
> might depend on the same process they need the exact same array of samples.
> Hence the state. If I understood you correctly, another solution is to add
> another duplicate process with same seed, instead of using one where i
> "reset" state.
>
> I promised that this could run on any laptop..
>
>
>
> søn. 29. aug. 2021 kl. 02:42 skrev Robert Kern <robert.k...@gmail.com>:
>
>> On Sat, Aug 28, 2021 at 5:56 AM Stig Korsnes <stigkors...@gmail.com>
>> wrote:
>>
>>> Thank you again Robert.
>>> I am using NamedTuple for mye keys, which also are keys in a dictionary.
>>> Each key will be unique (tuple on distinct int and enum), so I am thinking
>>> maybe the risk of producing duplicate hash is not present, but could as
>>> always be wrong :)
>>>
>>
>> Present, but possibly ignorably small. 128-bit spaces give enough
>> breathing room for me to be comfortable; 64-bit spaces like what hash()
>> will use for its results makes me just a little claustrophobic.
>>
>> If the structure of the keys is pretty fixed, just these two integers
>> (counting the enum as an integer), then I might just use both in the
>> seeding material.
>>
>> def get_key_seed(key:ComponentId, root_seed:int):
>>     return np.random.SeedSequence([key.the_int, int(key.the_enum),
>> root_seed])
>>
>>
>>> For positive ints i followed this tip
>>> https://stackoverflow.com/questions/18766535/positive-integer-from-python-hash-function
>>> , and did:
>>>
>>> def stronghash(key:ComponentId):
>>>     return ctypes.c_size_t(hash(key)).value
>>>
>>
>> np.uint64(possibly_negative_integer) will also work for this purpose
>> (somewhat more reliably).
>>
>> Since I will be using each process/random sample several times, and
>>> keeping all of them in memory at once is not feasible (dimensionality) i
>>> did the following:
>>>
>>>         self._rng = default_rng(cs)
>>>         self._state = dict(self._rng.bit_generator.state)  #
>>>
>>>     def scenarios(self) -> npt.NDArray[np.float64]:
>>>         self._rng.bit_generator.state = self._state
>>>        ....
>>>       return ....
>>>
>>> Would you consider this bad practice, or an ok solution?
>>>
>>
>> It's what that property is there for. No need to copy; `.state` creates a
>> new dict each time.
>>
>> In a quick test, I measured a process with 1 million Generator instances
>> to use ~1.5 GiB while 1 million state dicts ~1.0 GiB (including all of the
>> other overhead of Python and numpy; not a scientific test). Storing just
>> the BitGenerator is half-way in between. That's something, but not a huge
>> win. If that is really crossing the border from feasible to infeasible, you
>> may be about to run into your limits anyways for other reasons. So balance
>> that out with the complications of swapping state in and out of a single
>> instance.
>>
>> I Norway we have a saying which directly translates :" He asked for the
>>> finger... and took the whole arm" .
>>>
>>
>> Well, when I craft an overly-complicated system, I feel responsible to
>> help shepherd people along in using it well. :-)
>>
>> --
>> Robert Kern
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] SeedSequence.spawn()

Reply via email to