And big kudos for building AND shepherding :) søn. 29. aug. 2021 kl. 12:56 skrev Stig Korsnes <stigkors...@gmail.com>:
> Thanks again Robert! > Got rid of dict(state). > > Not sure I followed you completely on the test case. The "calculator" i am > writing , will for the specific use case depend on ~200-1000 processes. > Each process object will return say 1m floats when its method scenario is > called. If I am not mistaken, that would require 7-8GiB just to keep the > these in memory. Furthermore I would possibly have to add the size of the > dependent calculation on these (but would likely aggregate outside of > testing). A given object that depends on processes will calculate its > results based on 1-4 (1-4 *1m of these processes (non multiproc)), and > will loop over objects with processpool. So my reasoning is that running > memory consumption would then be (1-4)*size of 1m floats x processes + all > of other overhead. Since sampling 1m normals is pretty fast, I can happily > live with sampling (vs lookup in presampled array), but since two object > might depend on the same process they need the exact same array of samples. > Hence the state. If I understood you correctly, another solution is to add > another duplicate process with same seed, instead of using one where i > "reset" state. > > I promised that this could run on any laptop.. > > > > søn. 29. aug. 2021 kl. 02:42 skrev Robert Kern <robert.k...@gmail.com>: > >> On Sat, Aug 28, 2021 at 5:56 AM Stig Korsnes <stigkors...@gmail.com> >> wrote: >> >>> Thank you again Robert. >>> I am using NamedTuple for mye keys, which also are keys in a dictionary. >>> Each key will be unique (tuple on distinct int and enum), so I am thinking >>> maybe the risk of producing duplicate hash is not present, but could as >>> always be wrong :) >>> >> >> Present, but possibly ignorably small. 128-bit spaces give enough >> breathing room for me to be comfortable; 64-bit spaces like what hash() >> will use for its results makes me just a little claustrophobic. >> >> If the structure of the keys is pretty fixed, just these two integers >> (counting the enum as an integer), then I might just use both in the >> seeding material. >> >> def get_key_seed(key:ComponentId, root_seed:int): >> return np.random.SeedSequence([key.the_int, int(key.the_enum), >> root_seed]) >> >> >>> For positive ints i followed this tip >>> https://stackoverflow.com/questions/18766535/positive-integer-from-python-hash-function >>> , and did: >>> >>> def stronghash(key:ComponentId): >>> return ctypes.c_size_t(hash(key)).value >>> >> >> np.uint64(possibly_negative_integer) will also work for this purpose >> (somewhat more reliably). >> >> Since I will be using each process/random sample several times, and >>> keeping all of them in memory at once is not feasible (dimensionality) i >>> did the following: >>> >>> self._rng = default_rng(cs) >>> self._state = dict(self._rng.bit_generator.state) # >>> >>> def scenarios(self) -> npt.NDArray[np.float64]: >>> self._rng.bit_generator.state = self._state >>> .... >>> return .... >>> >>> Would you consider this bad practice, or an ok solution? >>> >> >> It's what that property is there for. No need to copy; `.state` creates a >> new dict each time. >> >> In a quick test, I measured a process with 1 million Generator instances >> to use ~1.5 GiB while 1 million state dicts ~1.0 GiB (including all of the >> other overhead of Python and numpy; not a scientific test). Storing just >> the BitGenerator is half-way in between. That's something, but not a huge >> win. If that is really crossing the border from feasible to infeasible, you >> may be about to run into your limits anyways for other reasons. So balance >> that out with the complications of swapping state in and out of a single >> instance. >> >> I Norway we have a saying which directly translates :" He asked for the >>> finger... and took the whole arm" . >>> >> >> Well, when I craft an overly-complicated system, I feel responsible to >> help shepherd people along in using it well. :-) >> >> -- >> Robert Kern >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion