On 12/2/2012 5:28 PM, Raul Cota wrote: > Hello, > > First a quick summary of my problem and at the end I include the basic > changes I am suggesting to the source (they may benefit others) > > I am ages behind in times and I am still using Numeric in Python 2.2.3. > The main reason why it has taken so long to upgrade is because NumPy > kills performance on several of my tests. > > I am sorry if this topic has been discussed before. I tried parsing the > mailing list and also google and all I found were comments related to > the fact that such is life when you use NumPy for small arrays. > > In my case I have several thousands of lines of code where data > structures rely heavily on Numeric arrays but it is unpredictable if the > problem at hand will result in large or small arrays. Furthermore, once > the vectorized operations complete, the values could be assigned into > scalars and just do simple math or loops. I am fairly sure the core of > my problems is that the 'float64' objects start propagating all over the > program data structures (not in arrays) and they are considerably slower > for just about everything when compared to the native python float. > > Conclusion, it is not practical for me to do a massive re-structuring of > code to improve speed on simple things like "a[0] < 4" (assuming "a" is > an array) which is about 10 times slower than "b < 4" (assuming "b" is a > float) > > > I finally decided to track down the problem and I started by getting > Python 2.6 from source and profiling it in one of my cases. By far the > biggest bottleneck came out to be PyString_FromFormatV which is a > function to assemble a string for a Python error caused by a failure to > find an attribute when "multiarray" calls PyObject_GetAttrString. This > function seems to get called way too often from NumPy. The real > bottleneck of trying to find the attribute when it does not exist is not > that it fails to find it, but that it builds a string to set a Python > error. In other words, something as simple as "a[0] < 3.5" internally > result in a call to set a python error . > > I downloaded NumPy code (for Python 2.6) and tracked down all the calls > like this, > > ret = PyObject_GetAttrString(obj, "__array_priority__"); > > and changed to > if (PyList_CheckExact(obj) || (Py_None == obj) || > PyTuple_CheckExact(obj) || > PyFloat_CheckExact(obj) || > PyInt_CheckExact(obj) || > PyString_CheckExact(obj) || > PyUnicode_CheckExact(obj)){ > //Avoid expensive calls when I am sure the attribute > //does not exist > ret = NULL; > } > else{ > ret = PyObject_GetAttrString(obj, "__array_priority__"); > > > > ( I think I found about 7 spots ) > > > I also noticed (not as bad in my case) that calls to PyObject_GetBuffer > also resulted in Python errors being set thus unnecessarily slower code. > > > With this change, something like this, > for i in xrange(1000000): > if a[1] < 35.0: > pass > > went down from 0.8 seconds to 0.38 seconds. > > A bogus test like this, > for i in xrange(1000000): > a = array([1., 2., 3.]) > > went down from 8.5 seconds to 2.5 seconds. > > > > Altogether, these simple changes got me half way to the speed I used to > get in Numeric and I could not see any slow down in any of my cases that > benefit from heavy array manipulation. I am out of ideas on how to > improve further though. > > Few questions: > - Is there any interest for me to provide the exact details of the code > I changed ? > > - I managed to compile NumPy through setup.py but I am not sure how to > force it to generate pdb files from my Visual Studio Compiler. I need > the pdb files such that I can run my profiler on NumPy. Anybody has any > experience with this ? (Visual Studio)
Change the compiler and linker flags in Python\Lib\distutils\msvc9compiler.py to: self.compile_options = ['/nologo', '/Ox', '/MD', '/W3', '/DNDEBUG', '/Zi'] self.ldflags_shared = ['/DLL', '/nologo', '/INCREMENTAL:YES', '/DEBUG'] Then rebuild numpy. Christoph > > - The core of my problems I think boil down to things like this > s = a[0] > assigning a float64 into s as opposed to a native float ? > Is there any way to hack code to change it to extract a native float > instead ? (probably crazy talk, but I thought I'd ask :) ). > I'd prefer to not use s = a.item(0) because I would have to change too > much code and it is not even that much faster. For example, > for i in xrange(1000000): > if a.item(1) < 35.0: > pass > is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes) > > > I apologize again if this topic has already been discussed. > > > Regards, > > Raul > > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion