On Fri, Jun 5, 2009 at 11:07 AM, Brian Blais <bbl...@bryant.edu> wrote: > Hello, > I have a vectorizing problem that I don't see an obvious way to solve. What > I have is a vector like: > obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) > and a matrix > T=zeros((6,6)) > and what I want in T is a count of all of the transitions in obs, e.g. > T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the > sequence 3-4 only happens once, etc... I can do it unvectorized like: > for o1,o2 in zip(obs[:-1],obs[1:]): > T[o1,o2]+=1 > > which gives the correct answer from above, which is: > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 3., 0., 0., 1.], > [ 0., 3., 0., 1., 0., 0.], > [ 0., 0., 2., 0., 1., 0.], > [ 0., 0., 0., 2., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) > > > but I thought there would be a better way. I tried: > o1=obs[:-1] > o2=obs[1:] > T[o1,o2]+=1 > but this doesn't give a count, it just yields 1's at the transition points, > like: > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 1., 0., 0., 1.], > [ 0., 1., 0., 1., 0., 0.], > [ 0., 0., 1., 0., 1., 0.], > [ 0., 0., 0., 1., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) > > Is there a clever way to do this? I could write a quick Cython solution, > but I wanted to keep this as an all-numpy implementation if I can.
It's a little faster (8.5% for me when obs is length 10000) if you do T = np.zeros((6,6), dtype=np.int) But it more than 5 times faster if you use lists for T and obs. You're just storing information here, so there is no reason to pay for the overhead of arrays. import random import numpy as np T = [[0,0,0,0,0,0], [0,0,0,0,0,0], [0,0,0,0,0,0], [0,0,0,0,0,0], [0,0,0,0,0,0], [0,0,0,0,0,0]] obs = [random.randint(0, 5) for z in range(10000)] def test(obs, T): for o1,o2 in zip(obs[:-1],obs[1:]): T[o1][o2] += 1 return T _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion