Hi David, Thanks for your comments, reply below the fold.

On Fri, Feb 17, 2017 at 4:34 PM, Daπid <davidmen...@gmail.com> wrote: > This is very nice indeed! > > On 17 February 2017 at 12:15, Robert McLeod <robbmcl...@gmail.com> wrote: > > * bytes and unicode support > > * reductions (mean, sum, prod, std) > > I use both a lot, maybe I can help you get them working. > > Also, regarding "Vectorization hasn't been done yet with cmath > functions for real numbers (such as sqrt(), exp(), etc.), only for > complex functions". What is the bottleneck? Is it in GCC or just > someone has to sit down and adapt it? I just haven't done it yet. Basically I'm moving from Switzerland to Canada in a week so this was the gap to push something out that's usable if not perfect. Rather I just import cmath functions, which are inlined but I suspect what's needed is to break them down into their components. For example, the complex arccos function looks like this: static void nc_acos( npy_intp n, npy_complex64 *x, npy_complex64 *r) { npy_complex64 a; for( npy_intp I = 0; I < n; I++ ) { a = x[I]; _inline_mul( x[I], x[I], r[I] ); _inline_sub( Z_1, r[I], r[I] ); _inline_sqrt( r[I], r[I] ); _inline_muli( r[I], r[I] ); _inline_add( a, r[I], r[I] ); _inline_log( r[I] , r[I] ); _inline_muli( r[I], r[I] ); _inline_neg( r[I], r[I]); } } I haven't sat down and inspected whether the cmath versions get vectorized, but there's not a huge speed difference between NE2 and 3 for such a function on float (but their is for complex), so my suspicion is they aren't. Another option would be to add a library such as Yeppp! as LIB_YEPPP or some other library that's faster than glib. For example the glib function "fma(a,b,c)" is slower than doing "a*b+c" in NE3, and that's not how it should be. Yeppp is also built with Python generating C code, so it could either be very easy or very hard. On bytes and unicode, I haven't seen examples for how people use it, so I'm not sure where to start. Since there's practically not a limitation on the number of operations now (the library is 1.3 MB now, compared to 1.2 MB for NE2 with gcc 5.4) the string functions could grow significantly from what we have in NE2. With regards to reductions, NumExpr never multi-threaded them, and could only do outer reductions, so in the end there was no speed advantage to be had compared to having NumPy do them on the result. I suspect the primary value there was in PyTables and Pandas where the expression had to do everything. One of the things I've moved away from in NE3 is doing output buffering (rather it pre-allocates the output array), so for reductions the understanding NumExpr has of broadcasting would have to be deeper. In any event contributions would certainly be welcome. Robert -- Robert McLeod, Ph.D. Center for Cellular Imaging and Nano Analytics (C-CINA) Biozentrum der Universität Basel Mattenstrasse 26, 4058 Basel Work: +41.061.387.3225 <061%20387%2032%2025> robert.mcl...@unibas.ch robert.mcl...@bsse.ethz.ch <robert.mcl...@ethz.ch> robbmcl...@gmail.com

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion