On Mon, Dec 3, 2012 at 6:14 AM, Nathaniel Smith <n...@pobox.com> wrote: > On Mon, Dec 3, 2012 at 1:28 AM, Raul Cota <r...@virtualmaterials.com> wrote: >> I finally decided to track down the problem and I started by getting >> Python 2.6 from source and profiling it in one of my cases. By far the >> biggest bottleneck came out to be PyString_FromFormatV which is a >> function to assemble a string for a Python error caused by a failure to >> find an attribute when "multiarray" calls PyObject_GetAttrString. This >> function seems to get called way too often from NumPy. The real >> bottleneck of trying to find the attribute when it does not exist is not >> that it fails to find it, but that it builds a string to set a Python >> error. In other words, something as simple as "a[0] < 3.5" internally >> result in a call to set a python error . >> >> I downloaded NumPy code (for Python 2.6) and tracked down all the calls >> like this, >> >> ret = PyObject_GetAttrString(obj, "__array_priority__"); >> >> and changed to >> if (PyList_CheckExact(obj) || (Py_None == obj) || >> PyTuple_CheckExact(obj) || >> PyFloat_CheckExact(obj) || >> PyInt_CheckExact(obj) || >> PyString_CheckExact(obj) || >> PyUnicode_CheckExact(obj)){ >> //Avoid expensive calls when I am sure the attribute >> //does not exist >> ret = NULL; >> } >> else{ >> ret = PyObject_GetAttrString(obj, "__array_priority__"); >> >> ( I think I found about 7 spots ) > > If the problem is the exception construction, then maybe this would > work about as well? > > if (PyObject_HasAttrString(obj, "__array_priority__") { > ret = PyObject_GetAttrString(obj, "__array_priority__"); > } else { > ret = NULL; > } > > If so then it would be an easier and more reliable way to accomplish this. > >> I also noticed (not as bad in my case) that calls to PyObject_GetBuffer >> also resulted in Python errors being set thus unnecessarily slower code. >> >> With this change, something like this, >> for i in xrange(1000000): >> if a[1] < 35.0: >> pass >> >> went down from 0.8 seconds to 0.38 seconds. > > Huh, why is PyObject_GetBuffer even getting called in this case? > >> A bogus test like this, >> for i in xrange(1000000): >> a = array([1., 2., 3.]) >> >> went down from 8.5 seconds to 2.5 seconds. > > I can see why we'd call PyObject_GetBuffer in this case, but not why > it would take 2/3rds of the total run-time... > >> - The core of my problems I think boil down to things like this >> s = a[0] >> assigning a float64 into s as opposed to a native float ? >> Is there any way to hack code to change it to extract a native float >> instead ? (probably crazy talk, but I thought I'd ask :) ). >> I'd prefer to not use s = a.item(0) because I would have to change too >> much code and it is not even that much faster. For example, >> for i in xrange(1000000): >> if a.item(1) < 35.0: >> pass >> is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes) > > I'm confused here -- first you say that your problems would be fixed > if a[0] gave you a native float, but then you say that a.item(0) > (which is basically a[0] that gives a native float) is still too slow? > (OTOH at 40% speedup is pretty good, even if it is just a > microbenchmark :-).) Array scalars are definitely pretty slow: > > In [9]: timeit a[0] > 1000000 loops, best of 3: 151 ns per loop > > In [10]: timeit a.item(0) > 10000000 loops, best of 3: 169 ns per loop > > In [11]: timeit a[0] < 35.0 > 1000000 loops, best of 3: 989 ns per loop > > In [12]: timeit a.item(0) < 35.0 > 1000000 loops, best of 3: 233 ns per loop > > It is probably possible to make numpy scalars faster... I'm not even > sure why they go through the ufunc machinery, like Travis said, since > they don't even follow the ufunc rules: > > In [3]: np.array(2) * [1, 2, 3] # 0-dim array coerces and broadcasts > Out[3]: array([2, 4, 6]) > > In [4]: np.array(2)[()] * [1, 2, 3] # scalar acts like python integer > Out[4]: [1, 2, 3, 1, 2, 3]
I thought it still behaves like a numpy "animal" >>> np.array(-2)[()] ** [1, 2, 3] array([-2, 4, -8]) >>> np.array(-2)[()] ** 0.5 nan >>> np.array(-2).item() ** [1, 2, 3] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported operand type(s) for ** or pow(): 'int' and 'list' >>> np.array(-2).item() ** 0.5 Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: negative number cannot be raised to a fractional power >>> np.array(0)[()] ** (-1) inf >>> np.array(0).item() ** (-1) Traceback (most recent call last): File "<stdin>", line 1, in <module> ZeroDivisionError: 0.0 cannot be raised to a negative power and similar I often try to avoid python scalars to avoid "surprising" behavior, and try to work defensively or fixed bugs by switching to np.power(...) (for example in the distributions). Josef > > But you may want to experiment a bit more to make sure this is > actually the problem. IME guesses about speed problems are almost > always wrong (even when I take this rule into account and only guess > when I'm *really* sure). > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion