Hi, I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary numpy --force-reinstall` and I can reproduce the regression.
Good news, I was able to reproduce the difference with only Numpy 1.20.1. Arrays prepared with (`df` is a Pandas dataframe) arr = df.values.copy() or arr = np.ascontiguousarray(df.values) lead to "slow" execution while arrays prepared with arr = np.copy(df.values) lead to faster execution. arr.copy() or np.copy(arr) do not give the same result, with arr obtained from a Pandas dataframe with arr = df.values. It's strange because type(df.values) gives <class 'numpy.ndarray'> so I would expect arr.copy() and np.copy(arr) to give exactly the same result. Note that I think I'm doing quite serious and reproducible benchmarks. I also checked that this regression is reproducible on another computer. Cheers, Pierre ----- Mail original ----- > De: "Sebastian Berg" <sebast...@sipsolutions.net> > À: "numpy-discussion" <numpy-discussion@python.org> > Envoyé: Vendredi 12 Mars 2021 22:50:24 > Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 > and 0.20 explaining a perf regression with > Pythran > On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote: >> Hi, >> >> I'm looking for a difference between Numpy 0.19.5 and 0.20 which >> could explain a performance regression (~15 %) with Pythran. >> >> I observe this regression with the script >> https://github.com/paugier/nbabel/blob/master/py/bench.py >> >> Pythran reimplements Numpy so it is not about Numpy code for >> computation. However, Pythran of course uses the native array >> contained in a Numpy array. I'm quite sure that something has changed >> between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?) >> since I don't get the same performance with Numpy 0.20. I checked >> that the values in the arrays are the same and that the flags >> characterizing the arrays are also the same. >> >> Good news, I'm now able to obtain the performance difference just >> with Numpy 0.19.5. In this code, I load the data with Pandas and need >> to prepare contiguous Numpy arrays to give them to Pythran. With >> Numpy 0.19.5, if I use np.copy I get better performance that with >> np.ascontiguousarray. With Numpy 0.20, both functions create array >> giving the same performance with Pythran (again, less good that with >> Numpy 0.19.5). >> >> Note that this code is very efficient (more that 100 times faster >> than using Numpy), so I guess that things like alignment or memory >> location can lead to such difference. >> >> More details in this issue >> https://github.com/serge-sans-paille/pythran/issues/1735 >> >> Any help to understand what has changed would be greatly appreciated! >> > > If you want to really dig into this, it would be good to do profiling > to find out at where the differences are. > > Without that, I don't have much appetite to investigate personally. The > reason is that fluctuations of ~30% (or even much more) when running > the NumPy benchmarks are very common. > > I am not aware of an immediate change in NumPy, especially since you > are talking pythran, and only the memory space or the interface code > should matter. > As to the interface code... I would expect it to be quite a bit faster, > not slower. > There was no change around data allocation, so at best what you are > seeing is a different pattern in how the "small array cache" ends up > being used. > > > Unfortunately, getting stable benchmarks that reflect code changes > exactly is tough... Here is a nice blog post from Victor Stinner where > he had to go as far as using "profile guided compilation" to avoid > fluctuations: > > https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html > > I somewhat hope that this is also the reason for the huge fluctuations > we see in the NumPy benchmarks due to absolutely unrelated code > changes. > But I did not have the energy to try it (and a probably fixed bug in > gcc makes it a bit harder right now). > > Cheers, > > Sebastian > > > > >> Cheers, >> Pierre >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion