Yeah, I get it. llvmlite would only do composition, while TACO is doing fusion. This is more promising!
Best regards, Compl > On 2020-11-25, at 17:17, Hameer Abbasi <einstein.edi...@gmail.com> wrote: > > > Hello, > > TACO consists of three things: > An array API > A scheduling language > A language for describing sparse modes of the tensor > So it combines arrays with scheduling, and also sparse tensors for a lot of > different applications. It also includes an auto-scheduler. The code thus > generated is on par or faster than, e.g. MKL and other equivalent libraries, > with the ability to do fusion for arbitrary expressions. This is, for more > complicated expressions involving sparse operands, big-O superior to > composing the operations. > > The limitations are: > Right now, it can only compute Einstein-summation type expressions, we’re > (along with Rawn, another member of the TACO team) trying to extend that to > any kind of point-wise expressions and reductions (such as exp(tensor), > sum(tensor), ...). > It requires a C compiler at runtime. We’re writing an LLVM backend for it > that will hopefully remove that requirement. > It can’t do arbitrary non-pointwise functions, e.g. SVD, inverse. This is a > long way from being completely solved. > > As for why not Numba/llvmlite: Re-writing TACO is a large task that would be > hard to do, wrapping/extending it is much easier. > > Best regards, > Hameer Abbasi > > -- > Sent from Canary <https://canarymail.io/> > > On Mittwoch, Nov. 25, 2020 at 9:07 AM, YueCompl <compl....@icloud.com > <mailto:compl....@icloud.com>> wrote: > Great to know. > > Skimmed through the project readme, so TACO currently generating C code as > intermediate language, if the purpose is about tensors, why not Numba's > llvmlite for it? > > I'm aware that the scheduling code tend not to be array programs, and > llvmlite may have tailored too much to optimize more general programs well. > How is TACO going in this regard? > > Compl > >> On 2020-11-25, at 02:27, Hameer Abbasi <einstein.edi...@gmail.com >> <mailto:einstein.edi...@gmail.com>> wrote: >> >> >> Hello, >> >> We’re trying to do a part of this in the TACO team, and with a Python >> wrapper in the form of PyData/Sparse. It will allow an abstract >> array/scheduling to take place, but there are a bunch of constraints, the >> most important one being that a C compiler cannot be required at runtime. >> >> However, this may take a while to materialize, as we need an LLVM backend, >> and a Python wrapper (matching the NumPy API), and support for arbitrary >> functions (like universal functions). >> >> https://github.com/tensor-compiler/taco >> <https://github.com/tensor-compiler/taco> >> http://fredrikbk.com/publications/kjolstad-thesis.pdf >> <http://fredrikbk.com/publications/kjolstad-thesis.pdf> >> >> -- >> Sent from Canary <https://canarymail.io/> >> >> On Dienstag, Nov. 24, 2020 at 7:22 PM, YueCompl <compl....@icloud.com >> <mailto:compl....@icloud.com>> wrote: >> Is there some community interest to develop fusion based high-performance >> array programming? Something like >> https://github.com/AccelerateHS/accelerate#an-embedded-language-for-accelerated-array-computations >> >> <https://github.com/AccelerateHS/accelerate#an-embedded-language-for-accelerated-array-computations> >> , but that embedded DSL is far less pleasing compared to Python as the >> surface language for optimized Numpy code in C. >> >> I imagine that we might be able to transpile a Numpy program into fused LLVM >> IR, then deploy part as host code on CPUs and part as CUDA code on GPUs? >> >> I know Numba is already doing the array part, but it is too limited in >> addressing more complex non-array data structures. I had been approaching >> ~20K separate data series with some intermediate variables for each, then it >> took up to 30+GB RAM keep compiling yet gave no result after 10+hours. >> >> Compl >> >> >>> On 2020-11-24, at 23:47, PIERRE AUGIER >>> <pierre.aug...@univ-grenoble-alpes.fr >>> <mailto:pierre.aug...@univ-grenoble-alpes.fr>> wrote: >>> >>> Hi, >>> >>> I recently took a bit of time to study the comment "The ecological impact >>> of high-performance computing in astrophysics" published in Nature >>> Astronomy (Zwart, 2020, https://www.nature.com/articles/s41550-020-1208-y >>> <https://www.nature.com/articles/s41550-020-1208-y>, >>> https://arxiv.org/pdf/2009.11295.pdf >>> <https://arxiv.org/pdf/2009.11295.pdf>), where it is stated that "Best >>> however, for the environment is to abandon Python for a more >>> environmentally friendly (compiled) programming language.". >>> >>> I wrote a simple Python-Numpy implementation of the problem used for this >>> study (https://www.nbabel.org <https://www.nbabel.org/>) and, accelerated >>> by Transonic-Pythran, it's very efficient. Here are some numbers (elapsed >>> times in s, smaller is better): >>> >>> | # particles | Py | C++ | Fortran | Julia | >>> |-------------|-----|-----|---------|-------| >>> | 1024 | 29 | 55 | 41 | 45 | >>> | 2048 | 123 | 231 | 166 | 173 | >>> >>> The code and a modified figure are here: https://github.com/paugier/nbabel >>> <https://github.com/paugier/nbabel> (There is no check on the results for >>> https://www.nbabel.org <https://www.nbabel.org/>, so one still has to be >>> very careful.) >>> >>> I think that the Numpy community should spend a bit of energy to show what >>> can be done with the existing tools to get very high performance (and low >>> CO2 production) with Python. This work could be the basis of a serious >>> reply to the comment by Zwart (2020). >>> >>> Unfortunately the Python solution in https://www.nbabel.org >>> <https://www.nbabel.org/> is very bad in terms of performance (and >>> therefore CO2 production). It is also true for most of the Python solutions >>> for the Computer Language Benchmarks Game in >>> https://benchmarksgame-team.pages.debian.net/benchmarksgame/ >>> <https://benchmarksgame-team.pages.debian.net/benchmarksgame/> (codes here >>> https://salsa.debian.org/benchmarksgame-team/benchmarksgame#what-else >>> <https://salsa.debian.org/benchmarksgame-team/benchmarksgame#what-else>). >>> >>> We could try to fix this so that people see that in many cases, it is not >>> necessary to "abandon Python for a more environmentally friendly (compiled) >>> programming language". One of the longest and hardest task would be to >>> implement the different cases of the Computer Language Benchmarks Game in >>> standard and modern Python-Numpy. Then, optimizing and accelerating such >>> code should be doable and we should be able to get very good performance at >>> least for some cases. Good news for this project, (i) the first point can >>> be done by anyone with good knowledge in Python-Numpy (many potential >>> workers), (ii) for some cases, there are already good Python >>> implementations and (iii) the work can easily be parallelized. >>> >>> It is not a criticism, but the (beautiful and very nice) new Numpy website >>> https://numpy.org/ <https://numpy.org/> is not very convincing in terms of >>> performance. It's written "Performant The core of NumPy is well-optimized C >>> code. Enjoy the flexibility of Python with the speed of compiled code." >>> It's true that the core of Numpy is well-optimized C code but to seriously >>> compete with C++, Fortran or Julia in terms of numerical performance, one >>> needs to use other tools to move the compiled-interpreted boundary outside >>> the hot loops. So it could be reasonable to mention such tools (in >>> particular Numba, Pythran, Cython and Transonic). >>> >>> Is there already something planned to answer to Zwart (2020)? >>> >>> Any opinions or suggestions on this potential project? >>> >>> Pierre >>> >>> PS: Of course, alternative Python interpreters (PyPy, GraalPython, Pyjion, >>> Pyston, etc.) could also be used, especially if HPy >>> (https://github.com/hpyproject/hpy <https://github.com/hpyproject/hpy>) is >>> successful (C core of Numpy written in HPy, Cython able to produce HPy >>> code, etc.). However, I tend to be a bit skeptical in the ability of such >>> technologies to reach very high performance for low-level Numpy code >>> (performance that can be reached by replacing whole Python functions with >>> optimized compiled code). Of course, I hope I'm wrong! IMHO, it does not >>> remove the need for a successful HPy! >>> >>> -- >>> Pierre Augier - CR CNRS http://www.legi.grenoble-inp.fr >>> <http://www.legi.grenoble-inp.fr/> >>> LEGI (UMR 5519) Laboratoire des Ecoulements Geophysiques et Industriels >>> BP53, 38041 Grenoble Cedex, France tel:+33.4.56.52.86.16 >>> <tel:+33.4.56.52.86.16> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> <https://mail.python.org/mailman/listinfo/numpy-discussion> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> >> https://mail.python.org/mailman/listinfo/numpy-discussion >> <https://mail.python.org/mailman/listinfo/numpy-discussion> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> >> https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> > https://mail.python.org/mailman/listinfo/numpy-discussion > <https://mail.python.org/mailman/listinfo/numpy-discussion>
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion