On the VPython list Scott Daniels suggested using try/except to deal
with the problem of sqrt(5.5) being numpy.float64 and thereby making
sqrt(5.5)*(VPython vector) not a (VPython vector), which ends up as a
big performance hit on existing programs. I tried his suggestion and did
some timing using the program shown below.
Using "from numpy import *", the numpy sqrt(5.5) gives 5.7 microsec per
sqrt, whereas using "from math import *" a sqrt is only 0.8 microsec.
Why is numpy so much slower than math on this simple case? For
completeness I also timed the old Numeric sqrt, which was 14 microsec,
so numpy is a big improvement, but still very slow compared to math.
Using Daniels's suggestion of first trying the math sqrt, falling
through to the numpy sqrt only if the argument isn't a simple scalar,
gives 1.3 microsec per sqrt on the simple case of a scalar argument.
Shouldn't/couldn't numpy do something like this internally?
Bruce Sherwood
----------------------------
from math import *
mathsqrt = sqrt
from numpy import *
numpysqrt = sqrt
from time import clock
# 0.8 microsec for "raw" math sqrt
# 5.7 microsec for "raw" numpy sqrt
# 1.3 microsec if we try math sqrt first
def sqrt(x):
try: return mathsqrt(x)
except TypeError: return numpysqrt(x)
# Check that numpy sqrt is invoked on an array:
nums = array([1,2,3])
print sqrt(nums)
x = 5.5
N = 500000
t1 = clock()
for n in range(N):
y = sqrt(x)
y = sqrt(x)
y = sqrt(x)
y = sqrt(x)
y = sqrt(x)
y = sqrt(x)
y = sqrt(x)
y = sqrt(x)
y = sqrt(x)
y = sqrt(x)
t2 = clock()
for n in range(N):
pass
t3 = clock()
# t3-t2 is the loop overhead (turns out negligible)
print "%i loops over 10 sqrt's takes %.1f seconds" % (N,t2-t1)
print "Total loop overhead = %.2f seconds (negligible)" % (t3-t2)
print "One sqrt takes %.1f microseconds" % (1e6*((t2-t1)-(t3-t2))/(10*N))
_______________________________________________
Numpy-discussion mailing list
[email protected]
http://projects.scipy.org/mailman/listinfo/numpy-discussion