On Sat, 2021-03-13 at 00:33 +0100, PIERRE AUGIER wrote: > Hi, > > I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary > numpy --force-reinstall` and I can reproduce the regression. > > Good news, I was able to reproduce the difference with only Numpy > 1.20.1. > > Arrays prepared with (`df` is a Pandas dataframe) > > arr = df.values.copy() > > or > > arr = np.ascontiguousarray(df.values) > > lead to "slow" execution while arrays prepared with > > arr = np.copy(df.values) > > lead to faster execution. > > arr.copy() or np.copy(arr) do not give the same result, with arr > obtained from a Pandas dataframe with arr = df.values. It's strange > because type(df.values) gives <class 'numpy.ndarray'> so I would > expect arr.copy() and np.copy(arr) to give exactly the same result.
The only thing that can change would be the arrays flags and `arr.strides`, but they should not have cahnged. And there is no change in NumPy that I can even remotely think of. Array data is just allocated with `malloc`. That is: as I understand it, you are *not* timing `np.copy` or `np.ascontiguouscopy` itself, but just operating on the array returned. NumPy only ever uses `malloc` for allocating array content. > > Note that I think I'm doing quite serious and reproducible > benchmarks. I also checked that this regression is reproducible on > another computer. I absolutely trust the benchmark results. I was hoping you might also be running a profiler (as in analyze the running program) to find out where the difference originate on the C side. That would allow to say with certainty either what changed or that there was no actual related code change. E.g. I have seen huge speed differences in the same `memcpy` or similar calls, due to whatever reasons (maybe due to compiler changes, or due to address space changes... or maybe the former causing the latter, I don't know.). Cheers, Sebastian > > Cheers, > > Pierre > > ----- Mail original ----- > > De: "Sebastian Berg" <sebast...@sipsolutions.net> > > À: "numpy-discussion" <numpy-discussion@python.org> > > Envoyé: Vendredi 12 Mars 2021 22:50:24 > > Objet: Re: [Numpy-discussion] Looking for a difference between > > Numpy 0.19.5 and 0.20 explaining a perf regression with > > Pythran > > > On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote: > > > Hi, > > > > > > I'm looking for a difference between Numpy 0.19.5 and 0.20 which > > > could explain a performance regression (~15 %) with Pythran. > > > > > > I observe this regression with the script > > > https://github.com/paugier/nbabel/blob/master/py/bench.py > > > > > > Pythran reimplements Numpy so it is not about Numpy code for > > > computation. However, Pythran of course uses the native array > > > contained in a Numpy array. I'm quite sure that something has > > > changed > > > between Numpy 0.19.5 and 0.20 (or between the corresponding > > > wheels?) > > > since I don't get the same performance with Numpy 0.20. I checked > > > that the values in the arrays are the same and that the flags > > > characterizing the arrays are also the same. > > > > > > Good news, I'm now able to obtain the performance difference just > > > with Numpy 0.19.5. In this code, I load the data with Pandas and > > > need > > > to prepare contiguous Numpy arrays to give them to Pythran. With > > > Numpy 0.19.5, if I use np.copy I get better performance that with > > > np.ascontiguousarray. With Numpy 0.20, both functions create > > > array > > > giving the same performance with Pythran (again, less good that > > > with > > > Numpy 0.19.5). > > > > > > Note that this code is very efficient (more that 100 times faster > > > than using Numpy), so I guess that things like alignment or > > > memory > > > location can lead to such difference. > > > > > > More details in this issue > > > https://github.com/serge-sans-paille/pythran/issues/1735 > > > > > > Any help to understand what has changed would be greatly > > > appreciated! > > > > > > > If you want to really dig into this, it would be good to do > > profiling > > to find out at where the differences are. > > > > Without that, I don't have much appetite to investigate personally. > > The > > reason is that fluctuations of ~30% (or even much more) when > > running > > the NumPy benchmarks are very common. > > > > I am not aware of an immediate change in NumPy, especially since > > you > > are talking pythran, and only the memory space or the interface > > code > > should matter. > > As to the interface code... I would expect it to be quite a bit > > faster, > > not slower. > > There was no change around data allocation, so at best what you are > > seeing is a different pattern in how the "small array cache" ends > > up > > being used. > > > > > > Unfortunately, getting stable benchmarks that reflect code changes > > exactly is tough... Here is a nice blog post from Victor Stinner > > where > > he had to go as far as using "profile guided compilation" to avoid > > fluctuations: > > > > > > https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html > > > > I somewhat hope that this is also the reason for the huge > > fluctuations > > we see in the NumPy benchmarks due to absolutely unrelated code > > changes. > > But I did not have the energy to try it (and a probably fixed bug > > in > > gcc makes it a bit harder right now). > > > > Cheers, > > > > Sebastian > > > > > > > > > > > Cheers, > > > Pierre > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion@python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion