Hi,

I recently took a bit of time to study the comment "The ecological impact of 
high-performance computing in astrophysics" published in Nature Astronomy 
(Zwart, 2020, https://www.nature.com/articles/s41550-020-1208-y, 
https://arxiv.org/pdf/2009.11295.pdf), where it is stated that "Best however, 
for the environment is to abandon Python for a more environmentally friendly 
(compiled) programming language.".

I wrote a simple Python-Numpy implementation of the problem used for this study 
(https://www.nbabel.org) and, accelerated by Transonic-Pythran, it's very 
efficient. Here are some numbers (elapsed times in s, smaller is better):

| # particles |  Py | C++ | Fortran | Julia |
|-------------|-----|-----|---------|-------|
|     1024    |  29 |  55 |   41    |   45  |
|     2048    | 123 | 231 |  166    |  173  |

The code and a modified figure are here: https://github.com/paugier/nbabel 
(There is no check on the results for https://www.nbabel.org, so one still has 
to be very careful.)

I think that the Numpy community should spend a bit of energy to show what can 
be done with the existing tools to get very high performance (and low CO2 
production) with Python. This work could be the basis of a serious reply to the 
comment by Zwart (2020).

Unfortunately the Python solution in https://www.nbabel.org is very bad in 
terms of performance (and therefore CO2 production). It is also true for most 
of the Python solutions for the Computer Language Benchmarks Game in 
https://benchmarksgame-team.pages.debian.net/benchmarksgame/ (codes here 
https://salsa.debian.org/benchmarksgame-team/benchmarksgame#what-else).

We could try to fix this so that people see that in many cases, it is not 
necessary to "abandon Python for a more environmentally friendly (compiled) 
programming language". One of the longest and hardest task would be to 
implement the different cases of the Computer Language Benchmarks Game in 
standard and modern Python-Numpy. Then, optimizing and accelerating such code 
should be doable and we should be able to get very good performance at least 
for some cases. Good news for this project, (i) the first point can be done by 
anyone with good knowledge in Python-Numpy (many potential workers), (ii) for 
some cases, there are already good Python implementations and (iii) the work 
can easily be parallelized.

It is not a criticism, but the (beautiful and very nice) new Numpy website 
https://numpy.org/ is not very convincing in terms of performance. It's written 
"Performant The core of NumPy is well-optimized C code. Enjoy the flexibility 
of Python with the speed of compiled code." It's true that the core of Numpy is 
well-optimized C code but to seriously compete with C++, Fortran or Julia in 
terms of numerical performance, one needs to use other tools to move the 
compiled-interpreted boundary outside the hot loops. So it could be reasonable 
to mention such tools (in particular Numba, Pythran, Cython and Transonic).

Is there already something planned to answer to Zwart (2020)?

Any opinions or suggestions on this potential project?

Pierre

PS: Of course, alternative Python interpreters (PyPy, GraalPython, Pyjion, 
Pyston, etc.) could also be used, especially if HPy 
(https://github.com/hpyproject/hpy) is successful (C core of Numpy written in 
HPy, Cython able to produce HPy code, etc.). However, I tend to be a bit 
skeptical in the ability of such technologies to reach very high performance 
for low-level Numpy code (performance that can be reached by replacing whole 
Python functions with optimized compiled code). Of course, I hope I'm wrong! 
IMHO, it does not remove the need for a successful HPy!

--
Pierre Augier - CR CNRS                 http://www.legi.grenoble-inp.fr
LEGI (UMR 5519) Laboratoire des Ecoulements Geophysiques et Industriels
BP53, 38041 Grenoble Cedex, France                tel:+33.4.56.52.86.16
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to