On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote: > Hi, > > I'm looking for a difference between Numpy 0.19.5 and 0.20 which > could explain a performance regression (~15 %) with Pythran. > > I observe this regression with the script > https://github.com/paugier/nbabel/blob/master/py/bench.py > > Pythran reimplements Numpy so it is not about Numpy code for > computation. However, Pythran of course uses the native array > contained in a Numpy array. I'm quite sure that something has changed > between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?) > since I don't get the same performance with Numpy 0.20. I checked > that the values in the arrays are the same and that the flags > characterizing the arrays are also the same. > > Good news, I'm now able to obtain the performance difference just > with Numpy 0.19.5. In this code, I load the data with Pandas and need > to prepare contiguous Numpy arrays to give them to Pythran. With > Numpy 0.19.5, if I use np.copy I get better performance that with > np.ascontiguousarray. With Numpy 0.20, both functions create array > giving the same performance with Pythran (again, less good that with > Numpy 0.19.5). > > Note that this code is very efficient (more that 100 times faster > than using Numpy), so I guess that things like alignment or memory > location can lead to such difference. > > More details in this issue > https://github.com/serge-sans-paille/pythran/issues/1735 > > Any help to understand what has changed would be greatly appreciated! >
If you want to really dig into this, it would be good to do profiling to find out at where the differences are. Without that, I don't have much appetite to investigate personally. The reason is that fluctuations of ~30% (or even much more) when running the NumPy benchmarks are very common. I am not aware of an immediate change in NumPy, especially since you are talking pythran, and only the memory space or the interface code should matter. As to the interface code... I would expect it to be quite a bit faster, not slower. There was no change around data allocation, so at best what you are seeing is a different pattern in how the "small array cache" ends up being used. Unfortunately, getting stable benchmarks that reflect code changes exactly is tough... Here is a nice blog post from Victor Stinner where he had to go as far as using "profile guided compilation" to avoid fluctuations: https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html I somewhat hope that this is also the reason for the huge fluctuations we see in the NumPy benchmarks due to absolutely unrelated code changes. But I did not have the energy to try it (and a probably fixed bug in gcc makes it a bit harder right now). Cheers, Sebastian > Cheers, > Pierre > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion