On 11 February 2011 09:01, FRENK Andreas <andreas.fr...@3ds.com> wrote: > Hi, > > I need to create a construct that returns the index of entries of the first > list, if values in the first and second list are equal. > > Take > > valA = [1,2,3,4,20,21,22,23,24] > valB = [1,2,3,4, 5,21,22,23] > The correct solution is: [0,1,2,3,5,6,7] > > A potential loop can be: > takeList=[] > for j,a in enumerate(valA): > if a in valB: > takeList.append(j) > > Please note, valA can have entries like [1,10000000,1000000001,…..], i.e. it > can be very sparse. > I also thought about using bincount, but due to the sparse nature the return > values from bincount would allocate too much memory. > > Any idea how to do it fast using numpy?
This probably isn't optimal yet, but seems to perform better than your for loop for large array sizes, but is less good at very small sizes. In [11]: def test(a, b): ....: takeList = [] ....: for j, A in enumerate(a): ....: if A in b: ....: takeList.append(j) ....: return takeList In [24]: a = np.random.randint(10, size=10) In [25]: b = np.random.randint(10, size=10) In [26]: %timeit test(a,b) 10000 loops, best of 3: 55.4 µs per loop In [27]: %timeit np.arange(a.size)[np.lib.setmember1d(a,b)] 10000 loops, best of 3: 92.9 µs per loop In [19]: a = np.random.randint(10000, size=10000) In [20]: b = np.random.randint(10000, size=10000) In [21]: %timeit np.arange(a.size)[np.lib.setmember1d(a,b)] 100 loops, best of 3: 7.99 ms per loop In [22]: %timeit test(a,b) 10 loops, best of 3: 787 ms per loop Hope that's useful, Angus -- AJC McMorland Post-doctoral research fellow Neurobiology, University of Pittsburgh _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion