On 02/12/2012 8:31 PM, Travis Oliphant wrote: > Raul, > > This is *fantastic work*. While many optimizations were done 6 years ago > as people started to convert their code, that kind of report has trailed off > in the last few years. I have not seen this kind of speed-comparison for > some time --- but I think it's definitely beneficial.
I'll clean up a bit as a Macro and comment. > NumPy still has quite a bit that can be optimized. I think your example is > really great. Perhaps it's worth making a C-API macro out of the short-cut > to the attribute string so it can be used by others. It would be > interesting to see where your other slow-downs are. I would be interested > to see if the slow-math of float64 is hurting you. It would be possible, > for example, to do a simple subclass of the ndarray that overloads > a[<integer>] to be the same as array.item(<integer>). The latter syntax > returns python objects (i.e. floats) instead of array scalars. > > Also, it would not be too difficult to add fast-math paths for int64, > float32, and float64 scalars (so they don't go through ufuncs but do > scalar-math like the float and int objects in Python. Thanks. I'll dig a bit more into the code. > > A related thing we've been working on lately which might help you is Numba > which might help speed up functions that have code like: "a[0] < 4" : > http://numba.pydata.org. > > Numba will translate the expression a[0] < 4 to a machine-code address-lookup > and math operation which is *much* faster when a is a NumPy array. > Presently this requires you to wrap your function call in a decorator: > > from numba import autojit > > @autojit > def function_to_speed_up(...): > pass > > In the near future (2-4 weeks), numba will grow the experimental ability to > basically replace all your function calls with @autojit versions in a Python > function. I would love to see something like this work: > > python -m numba filename.py > > To get an effective autojit on all the filename.py functions (and optionally > on all python modules it imports). The autojit works out of the box today > --- you can get Numba from PyPI (or inside of the completely free Anaconda > CE) to try it out. This looks very interesting. Will check it out. > Best, > > -Travis > > > > > On Dec 2, 2012, at 7:28 PM, Raul Cota wrote: > >> Hello, >> >> First a quick summary of my problem and at the end I include the basic >> changes I am suggesting to the source (they may benefit others) >> >> I am ages behind in times and I am still using Numeric in Python 2.2.3. >> The main reason why it has taken so long to upgrade is because NumPy >> kills performance on several of my tests. >> >> I am sorry if this topic has been discussed before. I tried parsing the >> mailing list and also google and all I found were comments related to >> the fact that such is life when you use NumPy for small arrays. >> >> In my case I have several thousands of lines of code where data >> structures rely heavily on Numeric arrays but it is unpredictable if the >> problem at hand will result in large or small arrays. Furthermore, once >> the vectorized operations complete, the values could be assigned into >> scalars and just do simple math or loops. I am fairly sure the core of >> my problems is that the 'float64' objects start propagating all over the >> program data structures (not in arrays) and they are considerably slower >> for just about everything when compared to the native python float. >> >> Conclusion, it is not practical for me to do a massive re-structuring of >> code to improve speed on simple things like "a[0] < 4" (assuming "a" is >> an array) which is about 10 times slower than "b < 4" (assuming "b" is a >> float) >> >> >> I finally decided to track down the problem and I started by getting >> Python 2.6 from source and profiling it in one of my cases. By far the >> biggest bottleneck came out to be PyString_FromFormatV which is a >> function to assemble a string for a Python error caused by a failure to >> find an attribute when "multiarray" calls PyObject_GetAttrString. This >> function seems to get called way too often from NumPy. The real >> bottleneck of trying to find the attribute when it does not exist is not >> that it fails to find it, but that it builds a string to set a Python >> error. In other words, something as simple as "a[0] < 3.5" internally >> result in a call to set a python error . >> >> I downloaded NumPy code (for Python 2.6) and tracked down all the calls >> like this, >> >> ret = PyObject_GetAttrString(obj, "__array_priority__"); >> >> and changed to >> if (PyList_CheckExact(obj) || (Py_None == obj) || >> PyTuple_CheckExact(obj) || >> PyFloat_CheckExact(obj) || >> PyInt_CheckExact(obj) || >> PyString_CheckExact(obj) || >> PyUnicode_CheckExact(obj)){ >> //Avoid expensive calls when I am sure the attribute >> //does not exist >> ret = NULL; >> } >> else{ >> ret = PyObject_GetAttrString(obj, "__array_priority__"); >> >> >> >> ( I think I found about 7 spots ) >> >> >> I also noticed (not as bad in my case) that calls to PyObject_GetBuffer >> also resulted in Python errors being set thus unnecessarily slower code. >> >> >> With this change, something like this, >> for i in xrange(1000000): >> if a[1] < 35.0: >> pass >> >> went down from 0.8 seconds to 0.38 seconds. >> >> A bogus test like this, >> for i in xrange(1000000): >> a = array([1., 2., 3.]) >> >> went down from 8.5 seconds to 2.5 seconds. >> >> >> >> Altogether, these simple changes got me half way to the speed I used to >> get in Numeric and I could not see any slow down in any of my cases that >> benefit from heavy array manipulation. I am out of ideas on how to >> improve further though. >> >> Few questions: >> - Is there any interest for me to provide the exact details of the code >> I changed ? >> >> - I managed to compile NumPy through setup.py but I am not sure how to >> force it to generate pdb files from my Visual Studio Compiler. I need >> the pdb files such that I can run my profiler on NumPy. Anybody has any >> experience with this ? (Visual Studio) >> >> - The core of my problems I think boil down to things like this >> s = a[0] >> assigning a float64 into s as opposed to a native float ? >> Is there any way to hack code to change it to extract a native float >> instead ? (probably crazy talk, but I thought I'd ask :) ). >> I'd prefer to not use s = a.item(0) because I would have to change too >> much code and it is not even that much faster. For example, >> for i in xrange(1000000): >> if a.item(1) < 35.0: >> pass >> is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes) >> >> >> I apologize again if this topic has already been discussed. >> >> >> Regards, >> >> Raul >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> [email protected] >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
