On Fri, Jun 5, 2009 at 12:53 PM, <josef.p...@gmail.com> wrote: > On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais <bbl...@bryant.edu> wrote: >> Hello, >> I have a vectorizing problem that I don't see an obvious way to solve. What >> I have is a vector like: >> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >> and a matrix >> T=zeros((6,6)) >> and what I want in T is a count of all of the transitions in obs, e.g. >> T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the >> sequence 3-4 only happens once, etc... I can do it unvectorized like: >> for o1,o2 in zip(obs[:-1],obs[1:]): >> T[o1,o2]+=1 >> >> which gives the correct answer from above, which is: >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 3., 0., 0., 1.], >> [ 0., 3., 0., 1., 0., 0.], >> [ 0., 0., 2., 0., 1., 0.], >> [ 0., 0., 0., 2., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) >> >> >> but I thought there would be a better way. I tried: >> o1=obs[:-1] >> o2=obs[1:] >> T[o1,o2]+=1 >> but this doesn't give a count, it just yields 1's at the transition points, >> like: >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 1., 0., 0., 1.], >> [ 0., 1., 0., 1., 0., 0.], >> [ 0., 0., 1., 0., 1., 0.], >> [ 0., 0., 0., 1., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) >> >> Is there a clever way to do this? I could write a quick Cython solution, >> but I wanted to keep this as an all-numpy implementation if I can. >> > > histogram2d or its imitation, there was a discussion on histogram2d a > short time ago > >>>> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >>>> obs2 = obs - 1 >>>> trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) >>>> re = np.array([[ 0., 0., 0., 0., 0., 0.], > ... [ 0., 0., 3., 0., 0., 1.], > ... [ 0., 3., 0., 1., 0., 0.], > ... [ 0., 0., 2., 0., 1., 0.], > ... [ 0., 0., 0., 2., 0., 0.], > ... [ 0., 0., 0., 0., 1., 0.]]) >>>> np.all(re == trans) > True > >>>> trans > array([[0, 0, 0, 0, 0, 0], > [0, 0, 3, 0, 0, 1], > [0, 3, 0, 1, 0, 0], > [0, 0, 2, 0, 1, 0], > [0, 0, 0, 2, 0, 0], > [0, 0, 0, 0, 1, 0]]) > > > or > >>>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) >>>> re > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 3., 0., 0., 1.], > [ 0., 3., 0., 1., 0., 0.], > [ 0., 0., 2., 0., 1., 0.], > [ 0., 0., 0., 2., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) >>>> h > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 3., 0., 0., 1.], > [ 0., 3., 0., 1., 0., 0.], > [ 0., 0., 2., 0., 1., 0.], > [ 0., 0., 0., 2., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) > >>>> np.all(re == h) > True
There's no way my list method can beat that. But by adding import psyco psyco.full() I get a total speed up of a factor of 15 when obs is length 10000. _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion