[Numpy-discussion] Re: Function that searches arrays for the first element that satisfies a condition

Dom Grigonis Mon, 30 Oct 2023 13:46:46 -0700

I juggled a bit and found pretty nice solution using numba. Which is probably 
not very robust, but proves that such thing can be optimised while retaining 
flexibility. Check if it works for your use cases and let me know if anything 
fails or if it is slow compared to what you used.


first_true_str = """
def first_true(arr, n):
    result = np.full((n, arr.shape[1]), -1, dtype=np.int32)
    for j in range(arr.shape[1]):
        k = 0
        for i in range(arr.shape[0]):
            x = arr[i:i + 1, j]
            if cond(x):
                result[k, j] = i
                k += 1
                if k >= n:
                    break
    return result
"""


class FirstTrue:
    CONTEXT = {'np': np}

    def __init__(self, expr):
        self.expr = expr
        self.expr_ast = ast.parse(expr, mode='exec').body[0].value
        self.func_ast = ast.parse(first_true_str, mode='exec')
        self.func_ast.body[0].body[1].body[1].body[1].test = self.expr_ast
        self.func_cmp = compile(self.func_ast, filename="<ast>", mode="exec")
        exec(self.func_cmp, self.CONTEXT)
        self.func_nb = nb.njit(self.CONTEXT[self.func_ast.body[0].name])

    def __call__(self, arr, n=1, axis=None):
        # PREPARE INPUTS
        in_1d = False
        if axis is None:
            arr = np.ravel(arr)[:, None]
            in_1d = True
        elif axis == 0:
            if arr.ndim == 1:
                in_1d = True
                arr = arr[:, None]
        else:
            raise ValueError('axis ~in (None, 0)')
        res = self.func_nb(arr, n)
        if in_1d:
            res = res[:, 0]
        return res


if __name__ == '__main__':
    arr = np.arange(125).reshape((5, 5, 5))
    ft = FirstTrue('np.sum(x) > 30')
    print(ft(arr, n=2, axis=0))
[[1 0 0 0 0]
 [2 1 1 1 1]]
In [16]: %timeit ft(arr, 2, axis=0)
1.31 µs ± 3.94 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Regards,
DG

> On 29 Oct 2023, at 23:18, rosko37 <rosk...@gmail.com> wrote:
> 
> An example with a 1-D array (where it is easiest to see what I mean) is the 
> following. I will follow Dom Grigonis's suggestion that the range not be 
> provided as a separate argument, as it can be just as easily "folded into" 
> the array by passing a slice. So it becomes just:
> idx = first_true(arr, cond)
> 
> As Dom also points out, the "cond" would likely need to be a "function 
> pointer" (i.e., the name of a function defined elsewhere, turning first_true 
> into a higher-order function), unless there's some way to pass a parseable 
> expression for simple cases. A few special cases like the first zero/nonzero 
> element could be handled with dedicated options (sort of like matplotlib 
> colors), but for anything beyond that it gets unwieldy fast.
> 
> So let's say we have this:
> ******************
> def cond(x):
>     return x>50
> 
> search_arr = np.exp(np.arange(0,1000))
> 
> print(np.first_true(search_arr, cond))
> *******************
> 
> This should print 4, because the element of search_arr at index 4 (i.e. the 
> 5th element) is e^4, which is slightly greater than 50 (while e^3 is less 
> than 50). It should return this without testing the 6th through 1000th 
> elements of the array at all to see whether they exceed 50 or not. This 
> example is rather contrived, because simply taking the natural log of 50 and 
> rounding up is far superior, not even evaluating the array of exponentials 
> (which my example clearly still does--and in the use cases I've had for such 
> a function, I can't predict the array elements like this--they come from 
> loaded data, the output of a simulation, etc., and are all already in a numpy 
> array). And in this case, since the values are strictly increasing, 
> search_sorted() would work as well. But it illustrates the idea.
> 
> 
> 
> 
> 
> On Thu, Oct 26, 2023 at 5:54 AM Dom Grigonis <dom.grigo...@gmail.com 
> <mailto:dom.grigo...@gmail.com>> wrote:
> Could you please give a concise example? I know you have provided one, but it 
> is engrained deep in verbose text and has some typos in it, which makes hard 
> to understand exactly what inputs should result in what output.
> 
> Regards,
> DG
> 
> > On 25 Oct 2023, at 22:59, rosko37 <rosk...@gmail.com 
> > <mailto:rosk...@gmail.com>> wrote:
> > 
> > I know this question has been asked before, both on this list as well as 
> > several threads on Stack Overflow, etc. It's a common issue. I'm NOT asking 
> > for how to do this using existing Numpy functions (as that information can 
> > be found in any of those sources)--what I'm asking is whether Numpy would 
> > accept inclusion of a function that does this, or whether (possibly more 
> > likely) such a proposal has already been considered and rejected for some 
> > reason.
> > 
> > The task is this--there's a large array and you want to find the next 
> > element after some index that satisfies some condition. Such elements are 
> > common, and the typical number of elements to be searched through is small 
> > relative to the size of the array. Therefore, it would greatly improve 
> > performance to avoid testing ALL elements against the conditional once one 
> > is found that returns True. However, all built-in functions that I know of 
> > test the entire array. 
> > 
> > One can obviously jury-rig some ways, like for instance create a "for" loop 
> > over non-overlapping slices of length slice_length and call something like 
> > np.where(cond) on each--that outer "for" loop is much faster than a loop 
> > over individual elements, and the inner loop at most will go slice_length-1 
> > elements past the first "hit". However, needing to use such a convoluted 
> > piece of code for such a simple task seems to go against the Numpy spirit 
> > of having one operation being one function of the form func(arr)".
> > 
> > A proposed function for this, let's call it "np.first_true(arr, start_idx, 
> > [stop_idx])" would be best implemented at the C code level, possibly in the 
> > same code file that defines np.where. I'm wondering if I, or someone else, 
> > were to write such a function, if the Numpy developers would consider 
> > merging it as a standard part of the codebase. It's possible that the idea 
> > of such a function is bad because it would violate some existing 
> > broadcasting or fancy indexing rules. Clearly one could make it possible to 
> > pass an "axis" argument to np.first_true() that would select an axis to 
> > search over in the case of multi-dimensional arrays, and then the result 
> > would be an array of indices of one fewer dimension than the original 
> > array. So np.first_true(np.array([1,5],[2,7],[9,10],cond) would return 
> > [1,1,0] for cond(x): x>4. The case where no elements satisfy the condition 
> > would need to return a "signal value" like -1. But maybe there are some 
> > weird cases where there isn't a sensible return val
>  ue, hence why such a function has not been added.
> > 
> > -Andrew Rosko
> > _______________________________________________
> > NumPy-Discussion mailing list -- numpy-discussion@python.org 
> > <mailto:numpy-discussion@python.org>
> > To unsubscribe send an email to numpy-discussion-le...@python.org 
> > <mailto:numpy-discussion-le...@python.org>
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ 
> > <https://mail.python.org/mailman3/lists/numpy-discussion.python.org/>
> > Member address: dom.grigo...@gmail.com <mailto:dom.grigo...@gmail.com>
> 
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org 
> <mailto:numpy-discussion@python.org>
> To unsubscribe send an email to numpy-discussion-le...@python.org 
> <mailto:numpy-discussion-le...@python.org>
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ 
> <https://mail.python.org/mailman3/lists/numpy-discussion.python.org/>
> Member address: rosk...@gmail.com <mailto:rosk...@gmail.com>
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: dom.grigo...@gmail.com

_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Function that searches arrays for the first element that satisfies a condition

Reply via email to