On Mon, Jan 24, 2022 at 11:01 AM Francesc Alted <fal...@gmail.com> wrote:

> On Mon, Jan 24, 2022 at 2:15 AM Warren Weckesser <
> warren.weckes...@gmail.com> wrote:
>
>> Thanks Sebastian for pointing out that the issue is (minor) page
>> faults, and thanks Francesc for providing the links and the analysis
>> of the situation.  After reading those links and experimenting with
>> malloc in a couple C programs, it is clear now what is happening.  The
>> key factors to be aware of are:
>
>
>> * What the values of M_MMAP_THRESHOLD and M_TRIM_THRESHOLD are and how
>> they are updated dynamically.
>> * Minor page faults are triggered by new allocations from the system,
>> whether via mmap or via sbrk(). This means avoiding mmap is not enough
>> to avoid potential problems with minor page faults.
>>
> <snip>
>
> Thanks for going down to the rabbit hole.  It is now much clearer what's
> going on.
>
> If we set M_MMAP_THRESHOLD to value large enough that we never use
>> mmap, we get results like this (also noted by Francesc):
>>
>> ```
>> $ MALLOC_MMAP_THRESHOLD_=20000000 python timing_question.py
>> numpy version 1.20.3
>>
>>  639.5976 microseconds
>>  641.1489 microseconds
>>  640.8228 microseconds
>>
>>  385.0931 microseconds
>>  384.8406 microseconds
>>  384.0398 microseconds
>> ```
>>
>> In this case, setting the mmap threshold disables the dynamic updating
>> of all the thresholds, so M_TRIM_THRESHOLD remains 128*1024.  This
>> means each time free() is called, there will almost certainly be
>> enough memory at the end of the heap that exceeds M_TRIM_THRESHOLD and
>> so each iteration in timeit results memory being allocated from the
>> system and returned to the system via sbrk(), and that leads to minor
>> page faults when writing to the temporary arrays in each iterations.
>> That is, we still get thrashing at the top of the heap.  (Apparenly
>> not all the calls of free() lead to trimming the heap, so the
>> performance is better with the second z array.)
>>
>
> <snip>
>
> This is still something that I don't get.  I suppose this is very minor,
> but it would be nice to see the reason for this.
>

FWIW, after some more reflection upon this, I think that a possible
explanation for the above is that in both cases, the temporaries are
returned using mmap(), and hence both incurring in the zeroing overhead.
If that is correct, this offers a new opportunity to better calculate where
the overhead comes from.

Although I suspect that Warren's box and mine (AMD Ryzen 9 5950X 16-Core)
are pretty similar, in order to compare apples with apples, I'm going to
use my own measurements for this case:

$ MALLOC_MMAP_THRESHOLD_=20000000 python prova2.py
numpy version 1.21.2

 610.0853 microseconds
 610.7483 microseconds
 611.1118 microseconds

 369.2480 microseconds
 369.6340 microseconds
 369.8093 microseconds

So, in this case, the cost of a minor page fault is ~240 ns ((610us -
369us) / 1000 pages).  Now, the times for the original bench in my box:

$ python prova2.py
numpy version 1.21.2

 598.2740 microseconds
 609.1857 microseconds
 609.0713 microseconds

 137.5792 microseconds
 136.7661 microseconds
 136.8159 microseconds

Here the temporaries for the second z do not use mmap nor incur into page
faults, so they are ((369us - 137us) / 1000 pages) = 233ns/page faster, so
this should be related with zeroing.  IIRC, for my system, memset() can go
up to 28 GB/s, so that accounts for 146ns of the 233ns. That leaves for an
additional 87ns/page that, as it is very close to the memory latency of my
box (around 78ns, see
https://www.anandtech.com/show/16214/amd-zen-3-ryzen-deep-dive-review-5950x-5900x-5800x-and-5700x-tested/5),
I think (wild guessing again) this is somehow related with the memory
management unit (MMU) operation (but not totally sure).

I am pretty sure we are not yet there for understanding everything
completely (e.g. I did leave CPU caches out of the analysis), but hey, it
is a lot of fun going down to the rabbit holes indeed.

-- 
Francesc Alted
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to