On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen <bpede...@gmail.com> wrote: > On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman<kwgood...@gmail.com> wrote: >> On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman <kwgood...@gmail.com> wrote: >>> On Fri, Jun 5, 2009 at 12:53 PM, <josef.p...@gmail.com> wrote: >>>> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais <bbl...@bryant.edu> wrote: >>>>> Hello, >>>>> I have a vectorizing problem that I don't see an obvious way to solve. >>>>> What >>>>> I have is a vector like: >>>>> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >>>>> and a matrix >>>>> T=zeros((6,6)) >>>>> and what I want in T is a count of all of the transitions in obs, e.g. >>>>> T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the >>>>> sequence 3-4 only happens once, etc... I can do it unvectorized like: >>>>> for o1,o2 in zip(obs[:-1],obs[1:]): >>>>> T[o1,o2]+=1 >>>>> >>>>> which gives the correct answer from above, which is: >>>>> array([[ 0., 0., 0., 0., 0., 0.], >>>>> [ 0., 0., 3., 0., 0., 1.], >>>>> [ 0., 3., 0., 1., 0., 0.], >>>>> [ 0., 0., 2., 0., 1., 0.], >>>>> [ 0., 0., 0., 2., 0., 0.], >>>>> [ 0., 0., 0., 0., 1., 0.]]) >>>>> >>>>> >>>>> but I thought there would be a better way. I tried: >>>>> o1=obs[:-1] >>>>> o2=obs[1:] >>>>> T[o1,o2]+=1 >>>>> but this doesn't give a count, it just yields 1's at the transition >>>>> points, >>>>> like: >>>>> array([[ 0., 0., 0., 0., 0., 0.], >>>>> [ 0., 0., 1., 0., 0., 1.], >>>>> [ 0., 1., 0., 1., 0., 0.], >>>>> [ 0., 0., 1., 0., 1., 0.], >>>>> [ 0., 0., 0., 1., 0., 0.], >>>>> [ 0., 0., 0., 0., 1., 0.]]) >>>>> >>>>> Is there a clever way to do this? I could write a quick Cython solution, >>>>> but I wanted to keep this as an all-numpy implementation if I can. >>>>> >>>> >>>> histogram2d or its imitation, there was a discussion on histogram2d a >>>> short time ago >>>> >>>>>>> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >>>>>>> obs2 = obs - 1 >>>>>>> trans = >>>>>>> np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) >>>>>>> re = np.array([[ 0., 0., 0., 0., 0., 0.], >>>> ... [ 0., 0., 3., 0., 0., 1.], >>>> ... [ 0., 3., 0., 1., 0., 0.], >>>> ... [ 0., 0., 2., 0., 1., 0.], >>>> ... [ 0., 0., 0., 2., 0., 0.], >>>> ... [ 0., 0., 0., 0., 1., 0.]]) >>>>>>> np.all(re == trans) >>>> True >>>> >>>>>>> trans >>>> array([[0, 0, 0, 0, 0, 0], >>>> [0, 0, 3, 0, 0, 1], >>>> [0, 3, 0, 1, 0, 0], >>>> [0, 0, 2, 0, 1, 0], >>>> [0, 0, 0, 2, 0, 0], >>>> [0, 0, 0, 0, 1, 0]]) >>>> >>>> >>>> or >>>> >>>>>>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >>>>>>> range=[[0,5],[0,5]]) >>>>>>> re >>>> array([[ 0., 0., 0., 0., 0., 0.], >>>> [ 0., 0., 3., 0., 0., 1.], >>>> [ 0., 3., 0., 1., 0., 0.], >>>> [ 0., 0., 2., 0., 1., 0.], >>>> [ 0., 0., 0., 2., 0., 0.], >>>> [ 0., 0., 0., 0., 1., 0.]]) >>>>>>> h >>>> array([[ 0., 0., 0., 0., 0., 0.], >>>> [ 0., 0., 3., 0., 0., 1.], >>>> [ 0., 3., 0., 1., 0., 0.], >>>> [ 0., 0., 2., 0., 1., 0.], >>>> [ 0., 0., 0., 2., 0., 0.], >>>> [ 0., 0., 0., 0., 1., 0.]]) >>>> >>>>>>> np.all(re == h) >>>> True >>> >>> There's no way my list method can beat that. But by adding >>> >>> import psyco >>> psyco.full() >>> >>> I get a total speed up of a factor of 15 when obs is length 10000. >> >> Actually, it is faster: >> >> histogram: >> >>>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) >>>> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >>>> range=[[0,5],[0,5]]) >> 100 loops, best of 3: 4.14 ms per loop >> >> lists: >> >>>> timeit test(obs3, T3) >> 1000 loops, best of 3: 1.32 ms per loop >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > here's a go: > > import numpy as np > import random > from itertools import groupby > > def test1(obs, T): > for o1,o2 in zip(obs[:-1],obs[1:]): > T[o1][o2] += 1 > return T > > > def test2(obs, T): > s = zip(obs[:-1], obs[1:]) > for idx, g in groupby(sorted(s)): > T[idx] = len(list(g)) > return T > > obs = [random.randint(0, 5) for z in range(10000)] > > print test2(obs, np.zeros((6, 6))) > print test1(obs, np.zeros((6, 6))) > > > ############## > > In [10]: timeit test1(obs, np.zeros((6, 6))) > 100 loops, best of 3: 18.8 ms per loop > > In [11]: timeit test2(obs, np.zeros((6, 6))) > 100 loops, best of 3: 6.91 ms per loop
Wait, you tested the list method with an array. Try timeit test1(obs, np.zeros((6, 6)).tolist()) Probably best to move the array/list creation out of the timeit loop. Then my method won't have to pay the cost of converting to a list :) _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion