Hi,
this is a problem which came up when trying to replace a hand-written
array concatenation with a call to numpy.vstack:
for some array sizes,
numpy.vstack(data)
runs > 20% longer than a loop like
alldata = numpy.empty((tlen, dim))
for x in data:
step = x.shape[0]
alldata[pos:pos+step] = x
pos += step
(example script attached)
$ python del_cum3.py numpy 10000 10000 1 10
problem size: (10000x10000) x 1 = 10^8
0.816s <------------------------------- numpy.concatentate of 10 arrays
10000x10000
$ python del_cum3.py concat 10000 10000 1 10
problem size: (10000x10000) x 1 = 10^8
0.642s <------------------------------- slice manipulation giving the same
result
When the array size is reduced to 100x100 or so, the computation time goes to 0,
so it seems that the dtype and dimension checking is negligible.
Does numpy.concatenate do some extra work?
Thanks for any pointers,
Zbyszek
PS. Architecture is amd64.
python2.6, numpy 1.3.0
or
python3.1, numpy 2.0.0.dev / tr...@8510
give the same result.
import sys, math
import numpy
import time
def concat(data):
dim = data[0].shape[1]
tlen = sum(x.shape[0] for x in data)
alldata = numpy.empty((tlen, dim))
pos = 0
for x in data:
step = x.shape[0]
alldata[pos:pos+step] = x
pos += step
return alldata
style = sys.argv[1]
N,M,K,T = [int(arg) for arg in sys.argv[2:]]
a = [numpy.random.rand(N,M) for _ in range(K)]
print("problem size: (%dx%d) x %d = 10^%g" % (N, M, K, math.log10(N*M*K)))
t = time.time()
if style == 'numpy':
for _ in xrange(T):
numpy.concatenate(a, 0)
elif style == 'vstack':
for _ in xrange(T):
numpy.vstack(a)
elif style == 'concat':
for _ in xrange(T):
concat(a)
else:
A = numpy.concatenate(a, 0)
B = numpy.vstack(a)
C = concat(a)
assert ((A == B) & (B == C)).all()
t = time.time() - t
print('%.3fs' % (t / T))
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion