On Thu, 2022-01-20 at 14:41 +0100, Francesc Alted wrote:
> On Wed, Jan 19, 2022 at 7:48 PM Francesc Alted <fal...@gmail.com>
> wrote:
> 
> > On Wed, Jan 19, 2022 at 6:58 PM Stanley Seibert
> > <sseib...@anaconda.com>
> > wrote:
> > 
> > > Given that this seems to be Linux only, is this related to how
> > > glibc does
> > > large allocations (>128kB) using mmap()?
> > > 
> > > https://stackoverflow.com/a/33131385
> > > 
> > 
> > That's a good point. As MMAP_THRESHOLD is 128 KB, and the size of
> > `z` is
> > almost 4 MB, mmap machinery is probably getting involved here. 
> > Also, as
> > pages acquired via anonymous mmap are not actually allocated until
> > you
> > access them the first time, that would explain that the first
> > access is
> > slow.  What puzzles me is that the timeit loops access `z` data
> > 3*10000
> > times, which is plenty of time for doing the allocation (just
> > should
> > require just a single iteration).
> > 
> 
> I think I have more evidence that what is happening here has to see
> of how
> the malloc mechanism works in Linux.  I find the next explanation to
> be
> really good:
> 
> https://sourceware.org/glibc/wiki/MallocInternals
> 

Thanks for figuring this out!  It has been bugging me a lot before.  So
it rather depends on how `malloc` works, and not the kernel.  It is
surprising how "random" this can look, but I suppose some examples just
happen to sit almost exactly at the threshold.

It might be interesting if we could tweak `mallopt` parameters for
typical NumPy usage.  But unless it is very clear, maybe a standalone
module is better?


> In addition, this excerpt of the mallopt manpage (
> https://man7.org/linux/man-pages/man3/mallopt.3.html) is very
> significant:

<snip>


> All in all, this is testimonial of how much memory handling can
> affect
> performance in modern computers.  Perhaps it is time for testing
> different
> memory allocation strategies in NumPy and come up with suggestions
> for
> users.

You are probably aware, but Matti and Elias now added the ability to
customize array data allocation in NumPy, so it should be straight
forward to write a small package/module that tweaks the allocation
strategy here.

Cheers,

Sebastian


> 
> Francesc
> 
> 
> 
> > 
> > 
> > > 
> > > 
> > > On Wed, Jan 19, 2022 at 9:06 AM Sebastian Berg <
> > > sebast...@sipsolutions.net> wrote:
> > > 
> > > > On Wed, 2022-01-19 at 11:49 +0100, Francesc Alted wrote:
> > > > > On Wed, Jan 19, 2022 at 7:33 AM Stefan van der Walt
> > > > > <stef...@berkeley.edu>
> > > > > wrote:
> > > > > 
> > > > > > On Tue, Jan 18, 2022, at 21:55, Warren Weckesser wrote:
> > > > > > > expr = 'z.real**2 + z.imag**2'
> > > > > > > 
> > > > > > > z = generate_sample(n, rng)
> > > > > > 
> > > > > > 🤔 If I duplicate the `z = ...` line, I get the fast result
> > > > > > throughout.
> > > > > > If, however, I use `generate_sample(1, rng)` (or any other
> > > > > > value
> > > > > > than `n`),
> > > > > > it does not improve matters.
> > > > > > 
> > > > > > Could this be a memory caching issue?
> > > > > > 
> > > > 
> > > > Yes, it is a caching issue for sure.  We have seen similar
> > > > random
> > > > fluctuations before.
> > > > You can proof that it is a cache page-fault issue by running it
> > > > with
> > > > `perf --stat`.  I did this twice, once with the second loop
> > > > removed
> > > > (page-faults only):
> > > > 
> > > > 28333629      page-faults               #  936.234 K/sec
> > > > 28362718      page-faults               #    1.147 M/sec
> > > > 
> > > > The number of page faults is low.  Running only the second one
> > > > (or
> > > > running the first one only once, rather), gave me:
> > > > 
> > > > 15024      page-faults               #    1.837 K/sec
> > > > 
> > > > So that is the *reason*.  I had before tried to figure out why
> > > > the page
> > > > faults differ too much, or if we can do something about it. 
> > > > But I
> > > > never had any reasonable lead on it.
> > > > 
> > > > In general, these fluctuations are pretty random, in the sense
> > > > that
> > > > unrelated code changes and recompilation can swap the behaviour
> > > > easily.
> > > > As Andras noted in that he sees the opposite.
> > > > 
> > > > I would love to have an idea if there is a way to figure out
> > > > why the
> > > > page-faults are so imbalanced between the two.
> > > > 
> > > > (I have not looked at CPU cache misses this time, but since
> > > > page-faults
> > > > happen, I assume that should not matter?)
> > > > 
> > > > Cheers,
> > > > 
> > > > Sebastian
> > > > 
> > > > 
> > > > > 
> > > > > I can also reproduce that, but only on my Linux boxes.  My
> > > > > MacMini
> > > > > does not
> > > > > notice the difference.
> > > > > 
> > > > > Interestingly enough, you don't even need an additional call
> > > > > to
> > > > > `generate_sample(n, rng)`. If one use `z = np.empty(...)` and
> > > > > then do
> > > > > an
> > > > > assignment, like:
> > > > > 
> > > > > z = np.empty(n, dtype=np.complex128)
> > > > > z[:] = generate_sample(n, rng)
> > > > > 
> > > > > then everything runs at the same speed:
> > > > > 
> > > > > numpy version 1.20.3
> > > > > 
> > > > >  142.3667 microseconds
> > > > >  142.3717 microseconds
> > > > >  142.3781 microseconds
> > > > > 
> > > > >  142.7593 microseconds
> > > > >  142.3579 microseconds
> > > > >  142.3231 microseconds
> > > > > 
> > > > > As another data point, by doing the same operation but using
> > > > > numexpr
> > > > > I am
> > > > > not seeing any difference either, not even on Linux:
> > > > > 
> > > > > numpy version 1.20.3
> > > > > numexpr version 2.8.1
> > > > > 
> > > > >   95.6513 microseconds
> > > > >   88.1804 microseconds
> > > > >   97.1322 microseconds
> > > > > 
> > > > >  105.0833 microseconds
> > > > >  100.5555 microseconds
> > > > >  100.5654 microseconds
> > > > > 
> > > > > [it is rather like a bit the other way around, the second
> > > > > iteration
> > > > > seems a
> > > > > hair faster]
> > > > > See the numexpr script below.
> > > > > 
> > > > > I am totally puzzled here.
> > > > > 
> > > > > """
> > > > > import timeit
> > > > > import numpy as np
> > > > > import numexpr as ne
> > > > > 
> > > > > 
> > > > > def generate_sample(n, rng):
> > > > >     return rng.normal(scale=1000,
> > > > > size=2*n).view(np.complex128)
> > > > > 
> > > > > 
> > > > > print(f'numpy version {np.__version__}')
> > > > > print(f'numexpr version {ne.__version__}')
> > > > > print()
> > > > > 
> > > > > rng = np.random.default_rng()
> > > > > n = 250000
> > > > > timeit_reps = 10000
> > > > > 
> > > > > expr = 'ne.evaluate("zreal**2 + zimag**2")'
> > > > > 
> > > > > z = generate_sample(n, rng)
> > > > > zreal = z.real
> > > > > zimag = z.imag
> > > > > for _ in range(3):
> > > > >     t = timeit.timeit(expr, globals=globals(),
> > > > > number=timeit_reps)
> > > > >     print(f"{1e6*t/timeit_reps:9.4f} microseconds")
> > > > > print()
> > > > > 
> > > > > z = generate_sample(n, rng)
> > > > > zreal = z.real
> > > > > zimag = z.imag
> > > > > for _ in range(3):
> > > > >     t = timeit.timeit(expr, globals=globals(),
> > > > > number=timeit_reps)
> > > > >     print(f"{1e6*t/timeit_reps:9.4f} microseconds")
> > > > > print()
> > > > > """
> > > > > 
> > > > > _______________________________________________
> > > > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > > > To unsubscribe send an email to
> > > > > numpy-discussion-le...@python.org
> > > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > > > Member address: sebast...@sipsolutions.net
> > > > 
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > > To unsubscribe send an email to
> > > > numpy-discussion-le...@python.org
> > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > > Member address: sseib...@anaconda.com
> > > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > To unsubscribe send an email to numpy-discussion-le...@python.org
> > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > Member address: fal...@gmail.com
> > > 
> > 
> > 
> > --
> > Francesc Alted
> > 
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to