Re: [Numpy-discussion] Comment published in Nature Astronomy about The ecological impact of computing with Python

Ilhan Polat Tue, 24 Nov 2020 10:51:11 -0800

Measuring running time of a program in arbitrary programming language is
not an objective metric. Otherwise force everyone code in Assembler and we
would be done as quick as possible. Hire 5 people to come to the workplace
for 6 months to optimize it and we will be done with their transportation.
There is a reason for not doing so. Alternatively, any time that will be
shaved off from this will be spent on extremely inefficient i9 laptops that
developers have while debugging the type issues. As the author themselves
admit, the development speed would justify the loss encountered from the
actual code running.


So this study is suggestive at the very least just; like my rebuttal, very
difficult to verify. I do Industrial IoT for a living, and while I
wholeheartedly agree with the intentions, I would seriously question the
power metrics given here because similarly I can easily show a steel
factory to be very efficient if I am not careful. Especially tying the code
quality to the programming language is a very slippery slope that I have
been listening to in the last 20 years from Fortran people.

> I think we, the community, does have to take it seriously. NumPy and the
rest of the ecosystem is trying to raise money to hire developers. This
sentiment, which is much wider than a single paper, is a prevalent
roadblock.

I don't get this sentence.



On Tue, Nov 24, 2020 at 7:29 PM Hameer Abbasi <einstein.edi...@gmail.com>
wrote:

> Hello,
>
> We’re trying to do a part of this in the TACO team, and with a Python
> wrapper in the form of PyData/Sparse. It will allow an abstract
> array/scheduling to take place, but there are a bunch of constraints, the
> most important one being that a C compiler cannot be required at runtime.
>
> However, this may take a while to materialize, as we need an LLVM backend,
> and a Python wrapper (matching the NumPy API), and support for arbitrary
> functions (like universal functions).
>
> https://github.com/tensor-compiler/taco
> http://fredrikbk.com/publications/kjolstad-thesis.pdf
>
> --
> Sent from Canary <https://canarymail.io>
>
> On Dienstag, Nov. 24, 2020 at 7:22 PM, YueCompl <compl....@icloud.com>
> wrote:
> Is there some community interest to develop fusion based high-performance
> array programming? Something like
> https://github.com/AccelerateHS/accelerate#an-embedded-language-for-accelerated-array-computations
>  ,
> but that embedded  DSL is far less pleasing compared to Python as the
> surface language for optimized Numpy code in C.
>
> I imagine that we might be able to transpile a Numpy program into fused
> LLVM IR, then deploy part as host code on CPUs and part as CUDA code on
> GPUs?
>
> I know Numba is already doing the array part, but it is too limited in
> addressing more complex non-array data structures. I had been approaching
> ~20K separate data series with some intermediate variables for each, then
> it took up to 30+GB RAM keep compiling yet gave no result after 10+hours.
>
> Compl
>
>
> On 2020-11-24, at 23:47, PIERRE AUGIER <
> pierre.aug...@univ-grenoble-alpes.fr> wrote:
>
> Hi,
>
> I recently took a bit of time to study the comment "The ecological impact
> of high-performance computing in astrophysics" published in Nature
> Astronomy (Zwart, 2020, https://www.nature.com/articles/s41550-020-1208-y,
> https://arxiv.org/pdf/2009.11295.pdf), where it is stated that "Best
> however, for the environment is to abandon Python for a more
> environmentally friendly (compiled) programming language.".
>
> I wrote a simple Python-Numpy implementation of the problem used for this
> study (https://www.nbabel.org) and, accelerated by Transonic-Pythran,
> it's very efficient. Here are some numbers (elapsed times in s, smaller is
> better):
>
> | # particles |  Py | C++ | Fortran | Julia |
> |-------------|-----|-----|---------|-------|
> |     1024    |  29 |  55 |   41    |   45  |
> |     2048    | 123 | 231 |  166    |  173  |
>
> The code and a modified figure are here: https://github.com/paugier/nbabel
> (There is no check on the results for https://www.nbabel.org, so one
> still has to be very careful.)
>
> I think that the Numpy community should spend a bit of energy to show what
> can be done with the existing tools to get very high performance (and low
> CO2 production) with Python. This work could be the basis of a serious
> reply to the comment by Zwart (2020).
>
> Unfortunately the Python solution in https://www.nbabel.org is very bad
> in terms of performance (and therefore CO2 production). It is also true for
> most of the Python solutions for the Computer Language Benchmarks Game in
> https://benchmarksgame-team.pages.debian.net/benchmarksgame/ (codes here
> https://salsa.debian.org/benchmarksgame-team/benchmarksgame#what-else).
>
> We could try to fix this so that people see that in many cases, it is not
> necessary to "abandon Python for a more environmentally friendly (compiled)
> programming language". One of the longest and hardest task would be to
> implement the different cases of the Computer Language Benchmarks Game in
> standard and modern Python-Numpy. Then, optimizing and accelerating such
> code should be doable and we should be able to get very good performance at
> least for some cases. Good news for this project, (i) the first point can
> be done by anyone with good knowledge in Python-Numpy (many potential
> workers), (ii) for some cases, there are already good Python
> implementations and (iii) the work can easily be parallelized.
>
> It is not a criticism, but the (beautiful and very nice) new Numpy website
> https://numpy.org/ is not very convincing in terms of performance. It's
> written "Performant The core of NumPy is well-optimized C code. Enjoy the
> flexibility of Python with the speed of compiled code." It's true that the
> core of Numpy is well-optimized C code but to seriously compete with C++,
> Fortran or Julia in terms of numerical performance, one needs to use other
> tools to move the compiled-interpreted boundary outside the hot loops. So
> it could be reasonable to mention such tools (in particular Numba, Pythran,
> Cython and Transonic).
>
> Is there already something planned to answer to Zwart (2020)?
>
> Any opinions or suggestions on this potential project?
>
> Pierre
>
> PS: Of course, alternative Python interpreters (PyPy, GraalPython, Pyjion,
> Pyston, etc.) could also be used, especially if HPy (
> https://github.com/hpyproject/hpy) is successful (C core of Numpy written
> in HPy, Cython able to produce HPy code, etc.). However, I tend to be a bit
> skeptical in the ability of such technologies to reach very high
> performance for low-level Numpy code (performance that can be reached by
> replacing whole Python functions with optimized compiled code). Of course,
> I hope I'm wrong! IMHO, it does not remove the need for a successful HPy!
>
> --
> Pierre Augier - CR CNRS                 http://www.legi.grenoble-inp.fr
> LEGI (UMR 5519) Laboratoire des Ecoulements Geophysiques et Industriels
> BP53, 38041 Grenoble Cedex, France                tel:+33.4.56.52.86.16
> <+33.4.56.52.86.16>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Comment published in Nature Astronomy about The ecological impact of computing with Python

Reply via email to