On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman <kwgood...@gmail.com> wrote: > On Fri, Jun 5, 2009 at 12:53 PM, <josef.p...@gmail.com> wrote: >> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais <bbl...@bryant.edu> wrote: >>> Hello, >>> I have a vectorizing problem that I don't see an obvious way to solve. What >>> I have is a vector like: >>> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >>> and a matrix >>> T=zeros((6,6)) >>> and what I want in T is a count of all of the transitions in obs, e.g. >>> T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the >>> sequence 3-4 only happens once, etc... I can do it unvectorized like: >>> for o1,o2 in zip(obs[:-1],obs[1:]): >>> T[o1,o2]+=1 >>> >>> which gives the correct answer from above, which is: >>> array([[ 0., 0., 0., 0., 0., 0.], >>> [ 0., 0., 3., 0., 0., 1.], >>> [ 0., 3., 0., 1., 0., 0.], >>> [ 0., 0., 2., 0., 1., 0.], >>> [ 0., 0., 0., 2., 0., 0.], >>> [ 0., 0., 0., 0., 1., 0.]]) >>> >>> >>> but I thought there would be a better way. I tried: >>> o1=obs[:-1] >>> o2=obs[1:] >>> T[o1,o2]+=1 >>> but this doesn't give a count, it just yields 1's at the transition points, >>> like: >>> array([[ 0., 0., 0., 0., 0., 0.], >>> [ 0., 0., 1., 0., 0., 1.], >>> [ 0., 1., 0., 1., 0., 0.], >>> [ 0., 0., 1., 0., 1., 0.], >>> [ 0., 0., 0., 1., 0., 0.], >>> [ 0., 0., 0., 0., 1., 0.]]) >>> >>> Is there a clever way to do this? I could write a quick Cython solution, >>> but I wanted to keep this as an all-numpy implementation if I can. >>> >> >> histogram2d or its imitation, there was a discussion on histogram2d a >> short time ago >> >>>>> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >>>>> obs2 = obs - 1 >>>>> trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) >>>>> re = np.array([[ 0., 0., 0., 0., 0., 0.], >> ... [ 0., 0., 3., 0., 0., 1.], >> ... [ 0., 3., 0., 1., 0., 0.], >> ... [ 0., 0., 2., 0., 1., 0.], >> ... [ 0., 0., 0., 2., 0., 0.], >> ... [ 0., 0., 0., 0., 1., 0.]]) >>>>> np.all(re == trans) >> True >> >>>>> trans >> array([[0, 0, 0, 0, 0, 0], >> [0, 0, 3, 0, 0, 1], >> [0, 3, 0, 1, 0, 0], >> [0, 0, 2, 0, 1, 0], >> [0, 0, 0, 2, 0, 0], >> [0, 0, 0, 0, 1, 0]]) >> >> >> or >> >>>>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) >>>>> re >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 3., 0., 0., 1.], >> [ 0., 3., 0., 1., 0., 0.], >> [ 0., 0., 2., 0., 1., 0.], >> [ 0., 0., 0., 2., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) >>>>> h >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 3., 0., 0., 1.], >> [ 0., 3., 0., 1., 0., 0.], >> [ 0., 0., 2., 0., 1., 0.], >> [ 0., 0., 0., 2., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) >> >>>>> np.all(re == h) >> True > > There's no way my list method can beat that. But by adding > > import psyco > psyco.full() > > I get a total speed up of a factor of 15 when obs is length 10000.
Actually, it is faster: histogram: >> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) >> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >> range=[[0,5],[0,5]]) 100 loops, best of 3: 4.14 ms per loop lists: >> timeit test(obs3, T3) 1000 loops, best of 3: 1.32 ms per loop _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion