Greg Novak wrote: > I have two arrays of integers, and would like to know _where_ they > have elements in common, not just _which_ elements are in common. > This is because the entries in the integer array are aligned with > other arrays. This seems very close to what member1d advertises as > its function. However, member1d says that it expects arrays with only > unique elements. > > First of all, my desired operation is well-posed: I'd like f(ar1, > ar2) to return something in the shape of ar1 with True if the value at > that position appears anywhere in ar2 (regardless of duplication) and > False otherwise. > > So I looked at the code and have two questions: > 1) What is this code trying to achieve? > aux = perm[ii+1] > perm[ii+1] = perm[ii] > perm[ii] = aux > > Here perm is the stable argsort of the two concatenated arguments: > perm = concatenate((ar1, ar2)).argsort(kind='mergesort'). > arr is the array of combined inputs in sorted order: > arr = concatenate((ar1, ar2))[perm] > and ii is a list of indices into arr where the value of arr is equal > to the next value in the array (arr[ii] == arr[ii+1]) _and_ arr[ii] > came from the _second_ input (ar2). > > Now, this last bit (looking for elements of arr that are equal and > both came from the second array) is clearly trying to deal with > duplication, which is why I'm interested... > > So, the code snippet is trying to swap perm[ii+1] with perm[ii], but I > don't see why. Furthermore, there are funny results if a value is > duplicated three times, not just twice -- perm is no longer a > permutation vector. Eg, member1d([1], [2,2,2]) results perm=[0,1,2,3] > and ii=[1,2] before the above snippet, and the above snippet makes > perm into [0,2,3,2] > > I've commented those three lines, and I've never seen any changes to > the output of member1d. The new value of perm is used to compute the > expression: perm.argsort(kind='mergesort')[:len( ar1 )], but the > changes to that expression as a result of the above three lines are > always at the high end of the array, which is sliced off by the last > [:len(ar1)]. > > Finally, my second question is: > 2) Does anyone have a test case where member1d fails as a result of > duplicates in the input? So far I haven't found any, with the above > lines commented or not. > > Upon reflection and review of the changelog, another theory occurs to > me: member1d did not originally use a stable sort. What I've written > above for interpretation of the value ii (indicates duplication within > ar2) is true for a stable sort, but for an unstable sort the same > condition has the interpretation that ii holds the values where the > sorting algorithm swapped the order of equal values unstably. Then > the code snippet in question 1) looks like an attempt to swap those > values in the permutation array to make the sort stable again. The > attempt would fail if there was duplication in either array. > > So, I would propose deleting those three lines (since they seem to be > a non-functional relic) and declaring in the docstring that member1d > doesn't require unique elements. > > Also, if this is correct, then the function simplifies considerably > since several values don't need to be computed anymore: > > def setmember1d( ar1, ar2 ): > ar = nm.concatenate( (ar1, ar2 ) ) > perm = ar.argsort(kind='mergesort') > aux = ar[perm] > flag = nm.concatenate( (aux[1:] == aux[:-1], [False] ) ) > indx = perm.argsort(kind='mergesort')[:len( ar1 )] > return flag[indx] > > Corrections to the above are welcome since I'm going to start using > member1d without regard for uniqueness, and I'd like to know if I'm > making a big mistake...
Hi Greg, I do not have much time to investigate it in detail right now, but it does not work for repeated entries in ar1: In [14]: nm.setmember1d( [1,2,3,2], [1, 3] ) Out[14]: array([ True, True, True, False], dtype=bool) thanks for trying to enhance arraysetops! r. _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion