On Wed, Dec 30, 2009 at 12:19 PM, Keith Goodman <kwgood...@gmail.com> wrote: > On Wed, Dec 30, 2009 at 12:08 AM, Eric Emsellem <eemse...@eso.org> wrote: >> Hi >> >> thanks for the tips. Unfortunately this is not what I am after. >> >>>> > ? import numpy as num >>>> > ? startarray = random((1000,100)) >>>> > ? take_sample = [1,2,5,6,1,2] >>>> > ? temp = num.take(startarray,take_sample,axis=1) >>> >>> Would it help to make temp a 1000x4 array instead of 1000x6? Could you >>> do that by changing take_sample to [1,2,5,6] and multiplying columns 1 >>> and 2 by a factor of 2? That would slow down the construction of temp >>> but speed up the addition (and slicing?) in the loop below. >> >> No it wouldn't help unfortunately, because the second instance of "1,2" >> would have different shifts. So I cannot just count the number of occurrence >> of each line. >> >> From the initial 2D array, 1D lines could be extracted several times, with >> each time a different shift. >> >>>> > ? shift = [10,20,34,-10,22,-20] >>>> > ? result = num.zeros(900) ?# shorter than initial because of the shift >>>> > ? for i in range(len(shift)) : >>>> > ? ? ?result += temp[100+shift[i]:-100+shift[1]] >>> >>> This looks fast to me. The slicing doesn't make a copy nor does the >>> addition. I've read that cython does fast indexing but I don't know if >>> that applies to slicing as well. I assume that shift[1] is a typo and >>> should be shift[i]. >> >> (yes of course the shift[1] should be shift[i]) >> Well this may be fast, but not fast enough. And also, starting from my 2D >> startarray again, it looks odd that I cannot do something like: >> >> startarray = random((1000,100)) >> take_sample = [1,2,5,6,1,2] >> shift = [10,20,34,-10,22,-20] >> result = >> num.sum(num.take(startarray,take_sample,axis=1)[100+shift:100-shift]) >> >> but of course this is nonsense because I cannot address the data this way >> (with "shift"). >> >> In fact I realise now that my question is simpler: how do I extract and sum >> 1d lines from a 2D array if I want first each line to be "shifted". So >> starting again now, I want a quick way to write: >> >> startarray = random((1000,6)) >> shift = [10,20,34,-10,22,-20] >> result = num.zeros(1000, dtype=float) >> for i in len(shift) : >> result += startarray[100+shift[i]:900+shift[i]] >> >> >> Can I write this directly with some numpy indexing without the loop in >> python? >> >> thanks for any tip. >> >> Eric > > Where's the bottleneck? There's the loop, there's constructing the > indices (which could be done outside the loop), slicing, adding. The > location of the bottleneck probably depends on the relative sizes of > the arrays. If the bottleneck is the loop, i.e. shift has a LOT of > elements, then it might speed things up to break shift into chunks and > use python's multiprocessing module to solve this in parallel. > Something like cython would also speed up the loop. > > I haven't tried running your code, but if anyone does, I think > > result += startarray[100+shift[i]:900+shift[i]] > > should be > > result += startarray[100+shift[i]:900+shift[i], i] > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
something like this ? just trying out, I haven't really checked carefully whether it actually replicates your snippets Constructing big intermediate arrays, might not improve performance compared to a loop >>> np.arange(30).reshape(6,5) array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24], [25, 26, 27, 28, 29]]) >>> np.arange(30).reshape(6,5)[np.array([[1,2,2,1]]).T,np.arange(0,3)+np.array([[0,1,2,1]]).T] array([[ 5, 6, 7], [11, 12, 13], [12, 13, 14], [ 6, 7, 8]]) >>> np.arange(30).reshape(6,5)[np.array([[1,2,2,1]]).T,np.arange(0,3)+np.array([[0,1,2,1]]).T].sum(0) array([34, 38, 42]) Josef _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion