On Fri, Jun 5, 2009 at 5:59 PM, Brent Pedersen <bpede...@gmail.com> wrote: > On Fri, Jun 5, 2009 at 2:01 PM, Keith Goodman<kwgood...@gmail.com> wrote: >> On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen <bpede...@gmail.com> wrote: >>> On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman<kwgood...@gmail.com> wrote: >>>> On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman <kwgood...@gmail.com> wrote: >>>>> On Fri, Jun 5, 2009 at 12:53 PM, <josef.p...@gmail.com> wrote: >>>>>> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais <bbl...@bryant.edu> wrote: >>>>>>> Hello, >>>>>>> I have a vectorizing problem that I don't see an obvious way to solve. >>>>>>> What >>>>>>> I have is a vector like: >>>>>>> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >>>>>>> and a matrix >>>>>>> T=zeros((6,6)) >>>>>>> and what I want in T is a count of all of the transitions in obs, e.g. >>>>>>> T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the >>>>>>> sequence 3-4 only happens once, etc... I can do it unvectorized like: >>>>>>> for o1,o2 in zip(obs[:-1],obs[1:]): >>>>>>> T[o1,o2]+=1 >>>>>>> >>>>>>> which gives the correct answer from above, which is: >>>>>>> array([[ 0., 0., 0., 0., 0., 0.], >>>>>>> [ 0., 0., 3., 0., 0., 1.], >>>>>>> [ 0., 3., 0., 1., 0., 0.], >>>>>>> [ 0., 0., 2., 0., 1., 0.], >>>>>>> [ 0., 0., 0., 2., 0., 0.], >>>>>>> [ 0., 0., 0., 0., 1., 0.]]) >>>>>>> >>>>>>> >>>>>>> but I thought there would be a better way. I tried: >>>>>>> o1=obs[:-1] >>>>>>> o2=obs[1:] >>>>>>> T[o1,o2]+=1 >>>>>>> but this doesn't give a count, it just yields 1's at the transition >>>>>>> points, >>>>>>> like: >>>>>>> array([[ 0., 0., 0., 0., 0., 0.], >>>>>>> [ 0., 0., 1., 0., 0., 1.], >>>>>>> [ 0., 1., 0., 1., 0., 0.], >>>>>>> [ 0., 0., 1., 0., 1., 0.], >>>>>>> [ 0., 0., 0., 1., 0., 0.], >>>>>>> [ 0., 0., 0., 0., 1., 0.]]) >>>>>>> >>>>>>> Is there a clever way to do this? I could write a quick Cython >>>>>>> solution, >>>>>>> but I wanted to keep this as an all-numpy implementation if I can. >>>>>>> >>>>>> >>>>>> histogram2d or its imitation, there was a discussion on histogram2d a >>>>>> short time ago >>>>>> >>>>>>>>> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >>>>>>>>> obs2 = obs - 1 >>>>>>>>> trans = >>>>>>>>> np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) >>>>>>>>> re = np.array([[ 0., 0., 0., 0., 0., 0.], >>>>>> ... [ 0., 0., 3., 0., 0., 1.], >>>>>> ... [ 0., 3., 0., 1., 0., 0.], >>>>>> ... [ 0., 0., 2., 0., 1., 0.], >>>>>> ... [ 0., 0., 0., 2., 0., 0.], >>>>>> ... [ 0., 0., 0., 0., 1., 0.]]) >>>>>>>>> np.all(re == trans) >>>>>> True >>>>>> >>>>>>>>> trans >>>>>> array([[0, 0, 0, 0, 0, 0], >>>>>> [0, 0, 3, 0, 0, 1], >>>>>> [0, 3, 0, 1, 0, 0], >>>>>> [0, 0, 2, 0, 1, 0], >>>>>> [0, 0, 0, 2, 0, 0], >>>>>> [0, 0, 0, 0, 1, 0]]) >>>>>> >>>>>> >>>>>> or >>>>>> >>>>>>>>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >>>>>>>>> range=[[0,5],[0,5]]) >>>>>>>>> re >>>>>> array([[ 0., 0., 0., 0., 0., 0.], >>>>>> [ 0., 0., 3., 0., 0., 1.], >>>>>> [ 0., 3., 0., 1., 0., 0.], >>>>>> [ 0., 0., 2., 0., 1., 0.], >>>>>> [ 0., 0., 0., 2., 0., 0.], >>>>>> [ 0., 0., 0., 0., 1., 0.]]) >>>>>>>>> h >>>>>> array([[ 0., 0., 0., 0., 0., 0.], >>>>>> [ 0., 0., 3., 0., 0., 1.], >>>>>> [ 0., 3., 0., 1., 0., 0.], >>>>>> [ 0., 0., 2., 0., 1., 0.], >>>>>> [ 0., 0., 0., 2., 0., 0.], >>>>>> [ 0., 0., 0., 0., 1., 0.]]) >>>>>> >>>>>>>>> np.all(re == h) >>>>>> True >>>>> >>>>> There's no way my list method can beat that. But by adding >>>>> >>>>> import psyco >>>>> psyco.full() >>>>> >>>>> I get a total speed up of a factor of 15 when obs is length 10000. >>>> >>>> Actually, it is faster: >>>> >>>> histogram: >>>> >>>>>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >>>>>> range=[[0,5],[0,5]]) >>>>>> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >>>>>> range=[[0,5],[0,5]]) >>>> 100 loops, best of 3: 4.14 ms per loop >>>> >>>> lists: >>>> >>>>>> timeit test(obs3, T3) >>>> 1000 loops, best of 3: 1.32 ms per loop >>>> _______________________________________________ >>>> Numpy-discussion mailing list >>>> Numpy-discussion@scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>> >>> >>> here's a go: >>> >>> import numpy as np >>> import random >>> from itertools import groupby >>> >>> def test1(obs, T): >>> for o1,o2 in zip(obs[:-1],obs[1:]): >>> T[o1][o2] += 1 >>> return T >>> >>> >>> def test2(obs, T): >>> s = zip(obs[:-1], obs[1:]) >>> for idx, g in groupby(sorted(s)): >>> T[idx] = len(list(g)) >>> return T >>> >>> obs = [random.randint(0, 5) for z in range(10000)] >>> >>> print test2(obs, np.zeros((6, 6))) >>> print test1(obs, np.zeros((6, 6))) >>> >>> >>> ############## >>> >>> In [10]: timeit test1(obs, np.zeros((6, 6))) >>> 100 loops, best of 3: 18.8 ms per loop >>> >>> In [11]: timeit test2(obs, np.zeros((6, 6))) >>> 100 loops, best of 3: 6.91 ms per loop >> >> Wait, you tested the list method with an array. Try >> >> timeit test1(obs, np.zeros((6, 6)).tolist()) >> >> Probably best to move the array/list creation out of the timeit loop. >> Then my method won't have to pay the cost of converting to a list :) >> > > ah right, your test1 is faster than test2 that way. > i'm just stoked to find out about ndimage.historgram. > > using this: >>>> histogram((obs[:-1])*6 +obs[1:], 0, 36, 36).reshape(6,6) > > it gives the exact result as test1/test2. > > -b >
the cleaned up bincount version looks competitive, this time with tests to avoid index errors Josef import numpy as np from scipy import ndimage def countndi(obs): return ndimage.histogram((obs[:-1])*6 +obs[1:], 0, 36, 36 ).reshape(6,6) def countbc(obs): tr = np.zeros(len(obs)+1,int) tr[2:] = (obs[:-1])*6 + obs[1:] tr[1] = 35 trans = np.bincount(tr) trans[0] -= 1 trans[-1] -= 1 return trans.reshape(6,6) obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) re = np.array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 3., 0., 0., 1.], [ 0., 3., 0., 1., 0., 0.], [ 0., 0., 2., 0., 1., 0.], [ 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) trans = countndi(obs) print np.all(re == trans) trans = countbc(obs) print np.all(re == trans) obs2=np.array([1,1,3,4,3,2,1,2,1,2,1,5,4,3,2]) transnd = countndi(obs2) transbc = countbc(obs2) print np.all(transnd == transbc) obs3=np.array([1,2,3,4,3,2,1,2,1,2,1,5,5,3,2]) transnd = countndi(obs3) transbc = countbc(obs3) print np.all(transnd == transbc) import time nit = 20000 t = time.time() for i in xrange(nit): countndi(obs) print time.time() - t t2 = time.time() for i in xrange(nit): countbc(obs) print time.time() - t2 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion