Re: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement

Travis Oliphant Sun, 02 Dec 2012 19:31:34 -0800

Raul, 

This is *fantastic work*.     While many optimizations were done 6 years ago as 
people started to convert their code, that kind of report has trailed off in 
the last few years.   I have not seen this kind of speed-comparison for some 
time --- but I think it's definitely beneficial.


NumPy still has quite a bit that can be optimized.   I think your example is 
really great.    Perhaps it's worth making a C-API macro out of the short-cut 
to the attribute string so it can be used by others.    It would be interesting 
to see where your other slow-downs are.     I would be interested to see if the 
slow-math of float64 is hurting you.    It would be possible, for example, to 
do a simple subclass of the ndarray that overloads a[<integer>] to be the same 
as array.item(<integer>).  The latter syntax returns python objects (i.e. 
floats) instead of array scalars. 

Also, it would not be too difficult to add fast-math paths for int64, float32, 
and float64 scalars (so they don't go through ufuncs but do scalar-math like 
the float and int objects in Python.  


A related thing we've been working on lately which might help you is Numba 
which might help speed up functions that have code like:  "a[0] < 4" :  
http://numba.pydata.org.

Numba will translate the expression a[0] < 4 to a machine-code address-lookup 
and math operation which is *much* faster when a is a NumPy array.    Presently 
this requires you to wrap your function call in a decorator: 

from numba import autojit

@autojit
def function_to_speed_up(...):
        pass

In the near future (2-4 weeks), numba will grow the experimental ability to 
basically replace all your function calls with @autojit versions in a Python 
function.    I would love to see something like this work: 

python -m numba filename.py

To get an effective autojit on all the filename.py functions (and optionally on 
all python modules it imports).    The autojit works out of the box today --- 
you can get Numba from PyPI (or inside of the completely free Anaconda CE) to 
try it out.     

Best, 

-Travis




On Dec 2, 2012, at 7:28 PM, Raul Cota wrote:

> Hello,
> 
> First a quick summary of my problem and at the end I include the basic 
> changes I am suggesting to the source (they may benefit others)
> 
> I am ages behind in times and I am still using Numeric in Python 2.2.3. 
> The main reason why it has taken so long to upgrade is because NumPy 
> kills performance on several of my tests.
> 
> I am sorry if this topic has been discussed before. I tried parsing the 
> mailing list and also google and all I found were comments related to 
> the fact that such is life when you use NumPy for small arrays.
> 
> In my case I have several thousands of lines of code where data 
> structures rely heavily on Numeric arrays but it is unpredictable if the 
> problem at hand will result in large or small arrays. Furthermore, once 
> the vectorized operations complete, the values could be assigned into 
> scalars and just do simple math or loops. I am fairly sure the core of 
> my problems is that the 'float64' objects start propagating all over the 
> program data structures (not in arrays) and they are considerably slower 
> for just about everything when compared to the native python float.
> 
> Conclusion, it is not practical for me to do a massive re-structuring of 
> code to improve speed on simple things like "a[0] < 4" (assuming "a" is 
> an array) which is about 10 times slower than "b < 4" (assuming "b" is a 
> float)
> 
> 
> I finally decided to track down the problem and I started by getting 
> Python 2.6 from source and profiling it in one of my cases. By far the 
> biggest bottleneck came out to be PyString_FromFormatV which is a 
> function to assemble a string for a Python error caused by a failure to 
> find an attribute when "multiarray" calls PyObject_GetAttrString. This 
> function seems to get called way too often from NumPy. The real 
> bottleneck of trying to find the attribute when it does not exist is not 
> that it fails to find it, but that it builds a string to set a Python 
> error. In other words, something as simple as "a[0] < 3.5" internally 
> result in a call to set a python error .
> 
> I downloaded NumPy code (for Python 2.6) and tracked down all the calls 
> like this,
> 
>  ret = PyObject_GetAttrString(obj, "__array_priority__");
> 
> and changed to
>     if (PyList_CheckExact(obj) ||  (Py_None == obj) ||
>         PyTuple_CheckExact(obj) ||
>         PyFloat_CheckExact(obj) ||
>         PyInt_CheckExact(obj) ||
>         PyString_CheckExact(obj) ||
>         PyUnicode_CheckExact(obj)){
>         //Avoid expensive calls when I am sure the attribute
>         //does not exist
>         ret = NULL;
>     }
>     else{
>         ret = PyObject_GetAttrString(obj, "__array_priority__");
> 
> 
> 
> ( I think I found about 7 spots )
> 
> 
> I also noticed (not as bad in my case) that calls to PyObject_GetBuffer 
> also resulted in Python errors being set thus unnecessarily slower code.
> 
> 
> With this change, something like this,
>     for i in xrange(1000000):
>         if a[1] < 35.0:
>             pass
> 
> went down from 0.8 seconds to 0.38 seconds.
> 
> A bogus test like this,
> for i in xrange(1000000):
>         a = array([1., 2., 3.])
> 
> went down from 8.5 seconds to 2.5 seconds.
> 
> 
> 
> Altogether, these simple changes got me half way to the speed I used to 
> get in Numeric and I could not see any slow down in any of my cases that 
> benefit from heavy array manipulation. I am out of ideas on how to 
> improve further though.
> 
> Few questions:
> - Is there any interest for me to provide the exact details of the code 
> I changed ?
> 
> - I managed to compile NumPy through setup.py but I am not sure how to 
> force it to generate pdb files from my Visual Studio Compiler. I need 
> the pdb files such that I can run my profiler on NumPy. Anybody has any 
> experience with this ? (Visual Studio)
> 
> - The core of my problems I think boil down to things like this
> s = a[0]
> assigning a float64 into s as opposed to a native float ?
> Is there any way to hack code to change it to extract a native float 
> instead ? (probably crazy talk, but I thought I'd ask :) ).
> I'd prefer to not use s = a.item(0) because I would have to change too 
> much code and it is not even that much faster. For example,
>     for i in xrange(1000000):
>         if a.item(1) < 35.0:
>             pass
> is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes)
> 
> 
> I apologize again if this topic has already been discussed.
> 
> 
> Regards,
> 
> Raul
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement

Reply via email to