On Fri, Jun 5, 2009 at 4:35 PM, Keith Goodman <kwgood...@gmail.com> wrote: > On Fri, Jun 5, 2009 at 1:31 PM, <josef.p...@gmail.com> wrote: >> On Fri, Jun 5, 2009 at 4:27 PM, Keith Goodman <kwgood...@gmail.com> wrote: >>> On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen <bpede...@gmail.com> wrote: >>>> On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman<kwgood...@gmail.com> wrote: >>>>> On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman <kwgood...@gmail.com> wrote: >>>>>> On Fri, Jun 5, 2009 at 12:53 PM, <josef.p...@gmail.com> wrote: >>>>>>> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais <bbl...@bryant.edu> wrote: >>>>>>>> Hello, >>>>>>>> I have a vectorizing problem that I don't see an obvious way to solve. >>>>>>>> What >>>>>>>> I have is a vector like: >>>>>>>> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >>>>>>>> and a matrix >>>>>>>> T=zeros((6,6)) >>>>>>>> and what I want in T is a count of all of the transitions in obs, e.g. >>>>>>>> T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because >>>>>>>> the >>>>>>>> sequence 3-4 only happens once, etc... I can do it unvectorized like: >>>>>>>> for o1,o2 in zip(obs[:-1],obs[1:]): >>>>>>>> T[o1,o2]+=1 >>>>>>>> >>>>>>>> which gives the correct answer from above, which is: >>>>>>>> array([[ 0., 0., 0., 0., 0., 0.], >>>>>>>> [ 0., 0., 3., 0., 0., 1.], >>>>>>>> [ 0., 3., 0., 1., 0., 0.], >>>>>>>> [ 0., 0., 2., 0., 1., 0.], >>>>>>>> [ 0., 0., 0., 2., 0., 0.], >>>>>>>> [ 0., 0., 0., 0., 1., 0.]]) >>>>>>>> >>>>>>>> >>>>>>>> but I thought there would be a better way. I tried: >>>>>>>> o1=obs[:-1] >>>>>>>> o2=obs[1:] >>>>>>>> T[o1,o2]+=1 >>>>>>>> but this doesn't give a count, it just yields 1's at the transition >>>>>>>> points, >>>>>>>> like: >>>>>>>> array([[ 0., 0., 0., 0., 0., 0.], >>>>>>>> [ 0., 0., 1., 0., 0., 1.], >>>>>>>> [ 0., 1., 0., 1., 0., 0.], >>>>>>>> [ 0., 0., 1., 0., 1., 0.], >>>>>>>> [ 0., 0., 0., 1., 0., 0.], >>>>>>>> [ 0., 0., 0., 0., 1., 0.]]) >>>>>>>> >>>>>>>> Is there a clever way to do this? I could write a quick Cython >>>>>>>> solution, >>>>>>>> but I wanted to keep this as an all-numpy implementation if I can. >>>>>>>> >>>>>>> >>>>>>> histogram2d or its imitation, there was a discussion on histogram2d a >>>>>>> short time ago >>>>>>> >>>>>>>>>> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >>>>>>>>>> obs2 = obs - 1 >>>>>>>>>> trans = >>>>>>>>>> np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) >>>>>>>>>> re = np.array([[ 0., 0., 0., 0., 0., 0.], >>>>>>> ... [ 0., 0., 3., 0., 0., 1.], >>>>>>> ... [ 0., 3., 0., 1., 0., 0.], >>>>>>> ... [ 0., 0., 2., 0., 1., 0.], >>>>>>> ... [ 0., 0., 0., 2., 0., 0.], >>>>>>> ... [ 0., 0., 0., 0., 1., 0.]]) >>>>>>>>>> np.all(re == trans) >>>>>>> True >>>>>>> >>>>>>>>>> trans >>>>>>> array([[0, 0, 0, 0, 0, 0], >>>>>>> [0, 0, 3, 0, 0, 1], >>>>>>> [0, 3, 0, 1, 0, 0], >>>>>>> [0, 0, 2, 0, 1, 0], >>>>>>> [0, 0, 0, 2, 0, 0], >>>>>>> [0, 0, 0, 0, 1, 0]]) >>>>>>> >>>>>>> >>>>>>> or >>>>>>> >>>>>>>>>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >>>>>>>>>> range=[[0,5],[0,5]]) >>>>>>>>>> re >>>>>>> array([[ 0., 0., 0., 0., 0., 0.], >>>>>>> [ 0., 0., 3., 0., 0., 1.], >>>>>>> [ 0., 3., 0., 1., 0., 0.], >>>>>>> [ 0., 0., 2., 0., 1., 0.], >>>>>>> [ 0., 0., 0., 2., 0., 0.], >>>>>>> [ 0., 0., 0., 0., 1., 0.]]) >>>>>>>>>> h >>>>>>> array([[ 0., 0., 0., 0., 0., 0.], >>>>>>> [ 0., 0., 3., 0., 0., 1.], >>>>>>> [ 0., 3., 0., 1., 0., 0.], >>>>>>> [ 0., 0., 2., 0., 1., 0.], >>>>>>> [ 0., 0., 0., 2., 0., 0.], >>>>>>> [ 0., 0., 0., 0., 1., 0.]]) >>>>>>> >>>>>>>>>> np.all(re == h) >>>>>>> True >>>>>> >>>>>> There's no way my list method can beat that. But by adding >>>>>> >>>>>> import psyco >>>>>> psyco.full() >>>>>> >>>>>> I get a total speed up of a factor of 15 when obs is length 10000. >>>>> >>>>> Actually, it is faster: >>>>> >>>>> histogram: >>>>> >>>>>>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >>>>>>> range=[[0,5],[0,5]]) >>>>>>> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >>>>>>> range=[[0,5],[0,5]]) >>>>> 100 loops, best of 3: 4.14 ms per loop >>>>> >>>>> lists: >>>>> >>>>>>> timeit test(obs3, T3) >>>>> 1000 loops, best of 3: 1.32 ms per loop >>>>> _______________________________________________ >>>>> Numpy-discussion mailing list >>>>> Numpy-discussion@scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>> >>>> >>>> here's a go: >>>> >>>> import numpy as np >>>> import random >>>> from itertools import groupby >>>> >>>> def test1(obs, T): >>>> for o1,o2 in zip(obs[:-1],obs[1:]): >>>> T[o1][o2] += 1 >>>> return T >>>> >>>> >>>> def test2(obs, T): >>>> s = zip(obs[:-1], obs[1:]) >>>> for idx, g in groupby(sorted(s)): >>>> T[idx] = len(list(g)) >>>> return T >>>> >>>> obs = [random.randint(0, 5) for z in range(10000)] >>>> >>>> print test2(obs, np.zeros((6, 6))) >>>> print test1(obs, np.zeros((6, 6))) >>>> >>>> >>>> ############## >>>> >>>> In [10]: timeit test1(obs, np.zeros((6, 6))) >>>> 100 loops, best of 3: 18.8 ms per loop >>>> >>>> In [11]: timeit test2(obs, np.zeros((6, 6))) >>>> 100 loops, best of 3: 6.91 ms per loop >>> >>> Nice! >>> >>> Try adding >>> >>> import psyco >>> psyco.full() >>> >>> to test1. Or is that cheating? >>> _______________________________________________ >>> Numpy-discussion mailing list >>> Numpy-discussion@scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> how about >> from scipy import ndimage >> ndimage.histogram((obs[:-1])*6 +obs[1:], 0, 35, 36 ).reshape(6,6) >> >> (bincount doesn't take the limits into account if they don't show up >> in the data, so first and last position would need special casing) > > Game over: > >>> from scipy.ndimage import histogram >>> timeit histogram((obs[:-1])*6 +obs[1:], 0, 35, 36 ).reshape(6,6) > 1000 loops, best of 3: 366 µs per loop
> Game over: maybe not, ndimage is very fast but crash prone (try feeding the wrong type) and will eventually be rewritten. Josef _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion