On Wed, 2022-01-19 at 19:48 +0100, Francesc Alted wrote: > On Wed, Jan 19, 2022 at 6:58 PM Stanley Seibert > <sseib...@anaconda.com> > wrote: > > > Given that this seems to be Linux only, is this related to how > > glibc does > > large allocations (>128kB) using mmap()? > > > > https://stackoverflow.com/a/33131385 > > > > That's a good point. As MMAP_THRESHOLD is 128 KB, and the size of `z` > is > almost 4 MB, mmap machinery is probably getting involved here. Also, > as > pages acquired via anonymous mmap are not actually allocated until > you > access them the first time, that would explain that the first access > is > slow. What puzzles me is that the timeit loops access `z` data > 3*10000 > times, which is plenty of time for doing the allocation (just should > require just a single iteration). >
Indeed, and I have no clue how to start understanding why the kernel ends up doing very different things in seemingly identical situations (I assume it is the kernel). Some memory fragmentation? First example just gets "lucky" for some reason to reuse a `malloc` exactly? …? We may need an expert in kernel virtual memory/page management :). Cheers, Sebastian > > > > > > > > On Wed, Jan 19, 2022 at 9:06 AM Sebastian Berg > > <sebast...@sipsolutions.net> > > wrote: > > > > > On Wed, 2022-01-19 at 11:49 +0100, Francesc Alted wrote: > > > > On Wed, Jan 19, 2022 at 7:33 AM Stefan van der Walt > > > > <stef...@berkeley.edu> > > > > wrote: > > > > > > > > > On Tue, Jan 18, 2022, at 21:55, Warren Weckesser wrote: > > > > > > expr = 'z.real**2 + z.imag**2' > > > > > > > > > > > > z = generate_sample(n, rng) > > > > > > > > > > 🤔 If I duplicate the `z = ...` line, I get the fast result > > > > > throughout. > > > > > If, however, I use `generate_sample(1, rng)` (or any other > > > > > value > > > > > than `n`), > > > > > it does not improve matters. > > > > > > > > > > Could this be a memory caching issue? > > > > > > > > > > > Yes, it is a caching issue for sure. We have seen similar random > > > fluctuations before. > > > You can proof that it is a cache page-fault issue by running it > > > with > > > `perf --stat`. I did this twice, once with the second loop > > > removed > > > (page-faults only): > > > > > > 28333629 page-faults # 936.234 K/sec > > > 28362718 page-faults # 1.147 M/sec > > > > > > The number of page faults is low. Running only the second one > > > (or > > > running the first one only once, rather), gave me: > > > > > > 15024 page-faults # 1.837 K/sec > > > > > > So that is the *reason*. I had before tried to figure out why > > > the page > > > faults differ too much, or if we can do something about it. But > > > I > > > never had any reasonable lead on it. > > > > > > In general, these fluctuations are pretty random, in the sense > > > that > > > unrelated code changes and recompilation can swap the behaviour > > > easily. > > > As Andras noted in that he sees the opposite. > > > > > > I would love to have an idea if there is a way to figure out why > > > the > > > page-faults are so imbalanced between the two. > > > > > > (I have not looked at CPU cache misses this time, but since page- > > > faults > > > happen, I assume that should not matter?) > > > > > > Cheers, > > > > > > Sebastian > > > > > > > > > > > > > > I can also reproduce that, but only on my Linux boxes. My > > > > MacMini > > > > does not > > > > notice the difference. > > > > > > > > Interestingly enough, you don't even need an additional call to > > > > `generate_sample(n, rng)`. If one use `z = np.empty(...)` and > > > > then do > > > > an > > > > assignment, like: > > > > > > > > z = np.empty(n, dtype=np.complex128) > > > > z[:] = generate_sample(n, rng) > > > > > > > > then everything runs at the same speed: > > > > > > > > numpy version 1.20.3 > > > > > > > > 142.3667 microseconds > > > > 142.3717 microseconds > > > > 142.3781 microseconds > > > > > > > > 142.7593 microseconds > > > > 142.3579 microseconds > > > > 142.3231 microseconds > > > > > > > > As another data point, by doing the same operation but using > > > > numexpr > > > > I am > > > > not seeing any difference either, not even on Linux: > > > > > > > > numpy version 1.20.3 > > > > numexpr version 2.8.1 > > > > > > > > 95.6513 microseconds > > > > 88.1804 microseconds > > > > 97.1322 microseconds > > > > > > > > 105.0833 microseconds > > > > 100.5555 microseconds > > > > 100.5654 microseconds > > > > > > > > [it is rather like a bit the other way around, the second > > > > iteration > > > > seems a > > > > hair faster] > > > > See the numexpr script below. > > > > > > > > I am totally puzzled here. > > > > > > > > """ > > > > import timeit > > > > import numpy as np > > > > import numexpr as ne > > > > > > > > > > > > def generate_sample(n, rng): > > > > return rng.normal(scale=1000, size=2*n).view(np.complex128) > > > > > > > > > > > > print(f'numpy version {np.__version__}') > > > > print(f'numexpr version {ne.__version__}') > > > > print() > > > > > > > > rng = np.random.default_rng() > > > > n = 250000 > > > > timeit_reps = 10000 > > > > > > > > expr = 'ne.evaluate("zreal**2 + zimag**2")' > > > > > > > > z = generate_sample(n, rng) > > > > zreal = z.real > > > > zimag = z.imag > > > > for _ in range(3): > > > > t = timeit.timeit(expr, globals=globals(), > > > > number=timeit_reps) > > > > print(f"{1e6*t/timeit_reps:9.4f} microseconds") > > > > print() > > > > > > > > z = generate_sample(n, rng) > > > > zreal = z.real > > > > zimag = z.imag > > > > for _ in range(3): > > > > t = timeit.timeit(expr, globals=globals(), > > > > number=timeit_reps) > > > > print(f"{1e6*t/timeit_reps:9.4f} microseconds") > > > > print() > > > > """ > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > > To unsubscribe send an email to > > > > numpy-discussion-le...@python.org > > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > > Member address: sebast...@sipsolutions.net > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > To unsubscribe send an email to numpy-discussion-le...@python.org > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > Member address: sseib...@anaconda.com > > > > > _______________________________________________ > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > To unsubscribe send an email to numpy-discussion-le...@python.org > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > Member address: fal...@gmail.com > > > > > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: sebast...@sipsolutions.net
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com