On Tue, Jan 12, 2016 at 9:18 PM, Charles R Harris
<charlesr.har...@gmail.com> wrote:
>
> Hi All,
>
> I've opened issue #7002, reproduced below, for discussion.
>>
>> Numpy umath has a file scalarmath.c.src that implements scalar arithmetic 
>> using special functions that are about 10x faster than the equivalent ufuncs.
>>
>> In [1]: a = np.float64(1)
>>
>> In [2]: timeit a*a
>> 10000000 loops, best of 3: 69.5 ns per loop
>>
>> In [3]: timeit np.multiply(a, a)
>> 1000000 loops, best of 3: 722 ns per loop
>>
>> I contend that in large programs this improvement in execution time is not 
>> worth the complexity and maintenance overhead; it is unlikely that 
>> scalar-scalar arithmetic is a significant part of their execution time. 
>> Therefore I propose to use ufuncs for all of the scalar-scalar arithmetic. 
>> This would also bring the benefits of __numpy_ufunc__ to scalars with 
>> minimal effort.
>
> Thoughts?

+1e6, scalars are a maintenance disaster is so many ways.

But can we actually pull it off? IIRC there were complaints about
scalars getting slower at some point (and not 10x slower), because
it's not actually too hard to have code that is heavy on scalar
arithmetic. (Indexing an array returns a numpy scalar rather than a
python object, even if these look similar, so any code that, say, does
a Python loop over the elements of an array may well be bottlenecked
by scalar arithmetic. Obviously it's better not to do such loops,
but...)

It still seems to me that surely we can surely speed up ufuncs on
scalars / small arrays? Also I am somewhat encouraged that like you I
get ~700 ns for multiply(scalar, scalar) versus ~70 ns for scalar *
scalar, but I also get ~380 ns for 0d-array * 0d-array. (I guess
probably for multiply(scalar, scalar) we're first calling asarray on
both scalar objects, which is certainly avoidable.)

Here's a profile of zerod * zerod [0]: http://vorpus.org/~njs/tmp/zerod.svg
(Click on PyNumber_Multiply to zoom in on the relevant part)

And here's multiply(scalar, scalar) [1]: http://vorpus.org/~njs/tmp/scalar.svg

In principle it feels like tons of this stuff is fat that can be
trimmed -- even in the first, faster, profile, we're allocating a 0d
array and then converting it to a scalar, and the latter conversion in
PyArray_Return takes 12% of time on its own; like 14% of the time is
spent trying to figure out from scratch the complicated type
resolution and casting procedure needed to multiply two float64s, ...

[0]
  a = np.array(1, dtype=float)
  for i in range(...):
     a * a

[1]
  s = np.float64(1)
  m = np.multiply
  for i in range(...):
    m(s, s)

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to