[Numpy-discussion] Re: overhead of numpy scalars

Sebastian Berg Wed, 04 May 2022 06:49:13 -0700

On Tue, 2022-05-03 at 15:06 +0200, Pieter Eendebak wrote:
> Hi everyone,
> 
> Operations such as np.sqrt or np.cos on numpy scalars are much slower
> than
> operations on an array of size 1 (benchmark below). The reason seems
> to be
> that a numpy scalar (e.g. np.float64(1.1)) is converted internally to
> an
> array, an array is created for the result of the operation and the
> result
> is converted back to a scalar (so three object constructions in
> total).
> Operating directly on an array creates an array for the result, which
> is
> returned (so one object construction in total).
> 
> Has anyone looked into improving this performance before or has ideas
> how
> to tackle this?


Improvements certainly here and there, but very little in terms of a
larger effort.  But not a coordinated effort to try to get the scalar
performance closer to the array one here.

It is a bit tricky, since at least I have little appetite to duplicate
a big chunk of the ufunc machinery.

I could think of a few things:

1. We need to special case things so the `__array_ufunc__` overhead
   goes away.  That needs a good idea... or just a fast check for
   NumPy scalars.

2. If we stick with arrays internally, improve the scalar -> array
   conversion.  Things that could help here:
   * A free-list for 0-D arrays (with small itemsize)?
   * Some fast paths to skip dtype/shape discovery here?

Unless we want to look into writing a "scalar ufunc" machinery (we
almost have a start for that in the scalar math, but ufuncs are much
more flexible than simple math).

It would be nice if we can chip away at improving things, but right now
I do not have a good plan of how to level the performance gap.
A super-fast scalar check (we have one that might just work), may help
with both of the points above I guess.

Cheers,

Sebastian



[1] To be fair, I may have removed some fast-paths there, but I don't
recall slowing things down, so I either sped things up elsewhere or
they were not thorough enough anyway probably.




> 
> With kind regards,
> Pieter Eendebak
> 
> Example benchmark:
> 
> import numpy as np
> from numpy import sqrt
> import time
> 
> v=np.array([1.1])
> t0=time.perf_counter()
> x=np.float64(v)
> for kk in range(1_000_000):
>     w1=sqrt(x)
> dt1=time.perf_counter()-t0
> print(dt1)
> 
> t0=time.perf_counter()
> x=v
> for kk in range(1_000_000):
>     w2=sqrt(x)
> dt2=time.perf_counter()-t0
> print(dt2)
> 
> print(f'factor {dt1/dt2:.2f} faster')
> assert(float(w1)==float(w2))
> 
> Results in
> 
> 1.0878291579974757
> 0.5115369699997245
> factor 2.13 faster
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net


_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: overhead of numpy scalars

Reply via email to