On Mon, Feb 15, 2016 at 10:43 AM, Gregor Thalhammer <
gregor.thalham...@gmail.com> wrote:

> Dear Robert,
> thanks for your effort on improving numexpr. Indeed, vectorized math
> libraries (VML) can give a large boost in performance (~5x), except for a
> couple of basic operations (add, mul, div), which current compilers are
> able to vectorize automatically. With recent gcc even more functions are
> vectorized, see https://sourceware.org/glibc/wiki/libmvec But you need
> special flags depending on the platform (SSE, AVX present?), runtime
> detection of processor capabilities would be nice for distributing
> binaries. Some time ago, since I lost access to Intels MKL, I patched
> numexpr to use Accelerate/Veclib on os x, which is preinstalled on each
> mac, see https://github.com/geggo/numexpr.git veclib_support branch.
> As you increased the opcode size, I could imagine providing a bit to
> switch (during runtime) between internal functions and vectorized ones,
> that would be handy for tests and benchmarks.

Dear Gregor,

Your suggestion to separate the opcode signature from the library used to
execute it is very clever. Based on your suggestion, I think that the
natural evolution of the opcodes is to specify them by function signature
and library, using a two-level dict, i.e.

numexpr.interpreter.opcodes['exp_f8f8f8'][gnu] = some_enum
numexpr.interpreter.opcodes['exp_f8f8f8'][msvc] = some_enum +1
numexpr.interpreter.opcodes['exp_f8f8f8'][vml] = some_enum + 2
numexpr.interpreter.opcodes['exp_f8f8f8'][yeppp] = some_enum +3

I want to procedurally generate opcodes.cpp and interpreter_body.cpp.  If I
do it the way you suggested funccodes.hpp and all the many #define's
regarding function codes in the interpreter can hopefully be removed and
hence simplify the overall codebase. One could potentially take it a step
further and plan (optimize) each expression, similar to what FFTW does with
regards to matrix shape. That is, the basic way to control the library
would be with a singleton library argument, i.e.:

result = ne.evaluate( "A*log(foo**2 / bar**2", lib=vml )

However, we could also permit a tuple to be passed in, where each element
of the tuple reflects the library to use for each operation in the AST tree:

result = ne.evaluate( "A*log(foo**2 / bar**2", lib=(gnu,gnu,gnu,yeppp,gnu) )

In this case the ops are (mul,mul,div,log,mul).  The op-code picking is
done by the Python side, and this tuple could be potentially optimized by
numexpr rather than hand-optimized, by trying various permutations of the
linked C math libraries. The wisdom from the planning could be pickled and
saved in a wisdom file.  Currently Numexpr has cacheDict in util.py but
there's no reason this can't be pickled and saved to disk. I've done a
similar thing by creating wrappers for PyFFTW already.


Robert McLeod, Ph.D.
Center for Cellular Imaging and Nano Analytics (C-CINA)
Biozentrum der Universit├Ąt Basel
Mattenstrasse 26, 4058 Basel
Work: +41.061.387.3225
robert.mcl...@bsse.ethz.ch <robert.mcl...@ethz.ch>
NumPy-Discussion mailing list

Reply via email to