Hi,
this is a problem which came up when trying to replace a hand-written
array concatenation with a call to numpy.vstack:
for some array sizes, 

   numpy.vstack(data)

runs > 20% longer than a loop like

   alldata = numpy.empty((tlen, dim))
   for x in data:
        step = x.shape[0]
        alldata[pos:pos+step] = x
        pos += step

(example script attached)

$ python del_cum3.py numpy 10000 10000 1 10
problem size: (10000x10000) x 1 = 10^8
0.816s <------------------------------- numpy.concatentate of 10 arrays 
10000x10000

$ python del_cum3.py concat 10000 10000 1 10
problem size: (10000x10000) x 1 = 10^8
0.642s <------------------------------- slice manipulation giving the same 
result

When the array size is reduced to 100x100 or so, the computation time goes to 0,
so it seems that the dtype and dimension checking is negligible.
Does numpy.concatenate do some extra work?

Thanks for any pointers,
Zbyszek

PS. Architecture is amd64.
    python2.6, numpy 1.3.0
    or
    python3.1, numpy 2.0.0.dev / tr...@8510
    give the same result.
import sys, math
import numpy
import time

def concat(data):
    dim = data[0].shape[1]
    tlen = sum(x.shape[0] for x in data)
    alldata = numpy.empty((tlen, dim))
    pos = 0
    for x in data:
        step = x.shape[0]
        alldata[pos:pos+step] = x
        pos += step

    return alldata

style = sys.argv[1]
N,M,K,T = [int(arg) for arg in sys.argv[2:]]
a = [numpy.random.rand(N,M) for _ in range(K)]
print("problem size: (%dx%d) x %d = 10^%g" % (N, M, K, math.log10(N*M*K)))

t = time.time()
if style == 'numpy':
    for _ in xrange(T):
        numpy.concatenate(a, 0)
elif style == 'vstack':
    for _ in xrange(T):
        numpy.vstack(a)
elif style == 'concat':
    for _ in xrange(T):
        concat(a)
else:
    A = numpy.concatenate(a, 0)
    B = numpy.vstack(a)
    C = concat(a)
    assert ((A == B) & (B == C)).all()

t = time.time() - t
print('%.3fs' % (t / T))
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to