On Mon, Feb 15, 2016 at 10:43 AM, Gregor Thalhammer < gregor.thalham...@gmail.com> wrote:
> > Dear Robert, > > thanks for your effort on improving numexpr. Indeed, vectorized math > libraries (VML) can give a large boost in performance (~5x), except for a > couple of basic operations (add, mul, div), which current compilers are > able to vectorize automatically. With recent gcc even more functions are > vectorized, see https://sourceware.org/glibc/wiki/libmvec But you need > special flags depending on the platform (SSE, AVX present?), runtime > detection of processor capabilities would be nice for distributing > binaries. Some time ago, since I lost access to Intels MKL, I patched > numexpr to use Accelerate/Veclib on os x, which is preinstalled on each > mac, see https://github.com/geggo/numexpr.git veclib_support branch. > > As you increased the opcode size, I could imagine providing a bit to > switch (during runtime) between internal functions and vectorized ones, > that would be handy for tests and benchmarks. > Dear Gregor, Your suggestion to separate the opcode signature from the library used to execute it is very clever. Based on your suggestion, I think that the natural evolution of the opcodes is to specify them by function signature and library, using a two-level dict, i.e. numexpr.interpreter.opcodes['exp_f8f8f8'][gnu] = some_enum numexpr.interpreter.opcodes['exp_f8f8f8'][msvc] = some_enum +1 numexpr.interpreter.opcodes['exp_f8f8f8'][vml] = some_enum + 2 numexpr.interpreter.opcodes['exp_f8f8f8'][yeppp] = some_enum +3 I want to procedurally generate opcodes.cpp and interpreter_body.cpp. If I do it the way you suggested funccodes.hpp and all the many #define's regarding function codes in the interpreter can hopefully be removed and hence simplify the overall codebase. One could potentially take it a step further and plan (optimize) each expression, similar to what FFTW does with regards to matrix shape. That is, the basic way to control the library would be with a singleton library argument, i.e.: result = ne.evaluate( "A*log(foo**2 / bar**2", lib=vml ) However, we could also permit a tuple to be passed in, where each element of the tuple reflects the library to use for each operation in the AST tree: result = ne.evaluate( "A*log(foo**2 / bar**2", lib=(gnu,gnu,gnu,yeppp,gnu) ) In this case the ops are (mul,mul,div,log,mul). The op-code picking is done by the Python side, and this tuple could be potentially optimized by numexpr rather than hand-optimized, by trying various permutations of the linked C math libraries. The wisdom from the planning could be pickled and saved in a wisdom file. Currently Numexpr has cacheDict in util.py but there's no reason this can't be pickled and saved to disk. I've done a similar thing by creating wrappers for PyFFTW already. Robert -- Robert McLeod, Ph.D. Center for Cellular Imaging and Nano Analytics (C-CINA) Biozentrum der Universität Basel Mattenstrasse 26, 4058 Basel Work: +41.061.387.3225 robert.mcl...@unibas.ch robert.mcl...@bsse.ethz.ch <robert.mcl...@ethz.ch> robbmcl...@gmail.com
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion