On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman<kwgood...@gmail.com> wrote: > On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman <kwgood...@gmail.com> wrote: >> On Fri, Jun 5, 2009 at 12:53 PM, <josef.p...@gmail.com> wrote: >>> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais <bbl...@bryant.edu> wrote: >>>> Hello, >>>> I have a vectorizing problem that I don't see an obvious way to solve. >>>> What >>>> I have is a vector like: >>>> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >>>> and a matrix >>>> T=zeros((6,6)) >>>> and what I want in T is a count of all of the transitions in obs, e.g. >>>> T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the >>>> sequence 3-4 only happens once, etc... I can do it unvectorized like: >>>> for o1,o2 in zip(obs[:-1],obs[1:]): >>>> T[o1,o2]+=1 >>>> >>>> which gives the correct answer from above, which is: >>>> array([[ 0., 0., 0., 0., 0., 0.], >>>> [ 0., 0., 3., 0., 0., 1.], >>>> [ 0., 3., 0., 1., 0., 0.], >>>> [ 0., 0., 2., 0., 1., 0.], >>>> [ 0., 0., 0., 2., 0., 0.], >>>> [ 0., 0., 0., 0., 1., 0.]]) >>>> >>>> >>>> but I thought there would be a better way. I tried: >>>> o1=obs[:-1] >>>> o2=obs[1:] >>>> T[o1,o2]+=1 >>>> but this doesn't give a count, it just yields 1's at the transition points, >>>> like: >>>> array([[ 0., 0., 0., 0., 0., 0.], >>>> [ 0., 0., 1., 0., 0., 1.], >>>> [ 0., 1., 0., 1., 0., 0.], >>>> [ 0., 0., 1., 0., 1., 0.], >>>> [ 0., 0., 0., 1., 0., 0.], >>>> [ 0., 0., 0., 0., 1., 0.]]) >>>> >>>> Is there a clever way to do this? I could write a quick Cython solution, >>>> but I wanted to keep this as an all-numpy implementation if I can. >>>> >>> >>> histogram2d or its imitation, there was a discussion on histogram2d a >>> short time ago >>> >>>>>> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >>>>>> obs2 = obs - 1 >>>>>> trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) >>>>>> re = np.array([[ 0., 0., 0., 0., 0., 0.], >>> ... [ 0., 0., 3., 0., 0., 1.], >>> ... [ 0., 3., 0., 1., 0., 0.], >>> ... [ 0., 0., 2., 0., 1., 0.], >>> ... [ 0., 0., 0., 2., 0., 0.], >>> ... [ 0., 0., 0., 0., 1., 0.]]) >>>>>> np.all(re == trans) >>> True >>> >>>>>> trans >>> array([[0, 0, 0, 0, 0, 0], >>> [0, 0, 3, 0, 0, 1], >>> [0, 3, 0, 1, 0, 0], >>> [0, 0, 2, 0, 1, 0], >>> [0, 0, 0, 2, 0, 0], >>> [0, 0, 0, 0, 1, 0]]) >>> >>> >>> or >>> >>>>>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >>>>>> range=[[0,5],[0,5]]) >>>>>> re >>> array([[ 0., 0., 0., 0., 0., 0.], >>> [ 0., 0., 3., 0., 0., 1.], >>> [ 0., 3., 0., 1., 0., 0.], >>> [ 0., 0., 2., 0., 1., 0.], >>> [ 0., 0., 0., 2., 0., 0.], >>> [ 0., 0., 0., 0., 1., 0.]]) >>>>>> h >>> array([[ 0., 0., 0., 0., 0., 0.], >>> [ 0., 0., 3., 0., 0., 1.], >>> [ 0., 3., 0., 1., 0., 0.], >>> [ 0., 0., 2., 0., 1., 0.], >>> [ 0., 0., 0., 2., 0., 0.], >>> [ 0., 0., 0., 0., 1., 0.]]) >>> >>>>>> np.all(re == h) >>> True >> >> There's no way my list method can beat that. But by adding >> >> import psyco >> psyco.full() >> >> I get a total speed up of a factor of 15 when obs is length 10000. > > Actually, it is faster: > > histogram: > >>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) >>> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >>> range=[[0,5],[0,5]]) > 100 loops, best of 3: 4.14 ms per loop > > lists: > >>> timeit test(obs3, T3) > 1000 loops, best of 3: 1.32 ms per loop > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
here's a go: import numpy as np import random from itertools import groupby def test1(obs, T): for o1,o2 in zip(obs[:-1],obs[1:]): T[o1][o2] += 1 return T def test2(obs, T): s = zip(obs[:-1], obs[1:]) for idx, g in groupby(sorted(s)): T[idx] = len(list(g)) return T obs = [random.randint(0, 5) for z in range(10000)] print test2(obs, np.zeros((6, 6))) print test1(obs, np.zeros((6, 6))) ############## In [10]: timeit test1(obs, np.zeros((6, 6))) 100 loops, best of 3: 18.8 ms per loop In [11]: timeit test2(obs, np.zeros((6, 6))) 100 loops, best of 3: 6.91 ms per loop _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion