Have you tried timing things? Thankfully this is easy to test because the Python source of numba-jitted functions is available at jitted_func.py_func.
In [23]: @numba.njit ...: def _first(arr, pred): ...: for i, elem in enumerate(arr): ...: if pred(elem): ...: return i ...: ...: def first(arr, pred): ...: _pred = numba.njit(pred) ...: return _first(arr, _pred) ...: In [24]: arr = np.random.random(100_000_000) In [25]: %timeit first(arr, lambda x: x > 5) 72 ms ± 1.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [26]: %timeit arr + 5 90.3 ms ± 762 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [27]: %timeit _first.py_func(arr, lambda x: x > 5) 7.8 s ± 46.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) So numba gives a >100x speedup. It's still not as fast as a NumPy function call that doesn't have an allocation overhead: In [30]: arr2 = np.empty_like(arr, dtype=bool) In [32]: %timeit np.greater(arr, 5, out=arr2) 13.9 ms ± 69.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) But it's certainly much better than pure Python! And it's not a huge cost for the flexibility. Juan. On Wed, 1 Nov 2023, at 10:42 AM, Dom Grigonis wrote: > This results in a very slow code. The function calls of > > > _pred = numba.njit(pred) > > are expensive and this sort of approach will be comparable to pure python > functions. > > This is only recommended for sourcing functions that are not called > frequently, but rather have a large computational content within them. In > other words not suitable for predicates. > > Regards, > DG > >> On 1 Nov 2023, at 01:05, Juan Nunez-Iglesias <j...@fastmail.com> wrote: >> >> If you add a layer of indirection with Numba you can get a *very* nice API: >> >> @numba.njit >> def _first(arr, pred): >> for i, elem in enumerate(arr): >> if pred(elem): >> return i >> >> def first(arr, pred): >> _pred = numba.njit(pred) >> return _first(arr, _pred) >> >> This even works with lambdas! (TIL, thanks Numba devs!) >> >> >>> first(np.random.random(10_000_000), lambda x: x > 0.99) >> 215 >> >> Since Numba has ufunc support I don't suppose it would be hard to make it >> work with an axis= argument, but I've never played with that API myself. >> >> On Tue, 31 Oct 2023, at 6:49 PM, Lev Maximov wrote: >>> I've implemented such functions in Cython and packaged them into a library >>> called numpy_illustrated <https://pypi.org/project/numpy-illustrated/> >>> >>> It exposes the following functions: >>> >>> find(a, v) # returns the index of the first occurrence of v in a >>> first_above(a, v) # returns the index of the first element in a that is >>> strictly above v >>> first_nonzero(a) # returns the index of the first nonzero element >>> >>> They scan the array and bail out immediately once the match is found. Have >>> a significant performance gain if the element to be >>> found is closer to the beginning of the array. Have roughly the same speed >>> as alternative methods if the value is missing. >>> >>> The complete signatures of the functions look like this: >>> >>> find(a, v, rtol=1e-05, atol=1e-08, sorted=False, default=-1, raises=False) >>> first_above(a, v, sorted=False, missing=-1, raises=False) >>> first_nonzero(a, missing=-1, raises=False) >>> >>> This covers the most common use cases and does not accept Python callbacks >>> because accepting them would nullify any speed gain >>> one would expect from such a function. A Python callback can be implemented >>> with Numba, but anyone who can write the callback >>> in Numba has no need for a library that wraps it into a dedicated function. >>> >>> The library has a 100% test coverage. Code style 'black'. It should be easy >>> to add functions like 'first_below' if necessary. >>> >>> A more detailed description of these functions can be found here >>> <https://betterprogramming.pub/the-numpy-illustrated-library-7531a7c43ffb?sk=8dd60bfafd6d49231ac76cb148a4d16f>. >>> >>> Best regards, >>> Lev Maximov >>> >>> On Tue, Oct 31, 2023 at 3:50 AM Dom Grigonis <dom.grigo...@gmail.com> wrote: >>>> I juggled a bit and found pretty nice solution using numba. Which is >>>> probably not very robust, but proves that such thing can be optimised >>>> while retaining flexibility. Check if it works for your use cases and let >>>> me know if anything fails or if it is slow compared to what you used. >>>> >>>> >>>> >>>> first_true_str = """ >>>> def first_true(arr, n): >>>> result = np.full((n, arr.shape[1]), -1, dtype=np.int32) >>>> for j in range(arr.shape[1]): >>>> k = 0 >>>> for i in range(arr.shape[0]): >>>> x = arr[i:i + 1, j] >>>> if cond(x): >>>> result[k, j] = i >>>> k += 1 >>>> if k >= n: >>>> break >>>> return result >>>> """ >>>> >>>> >>>> *class* *FirstTrue*: >>>> CONTEXT = {'np': np} >>>> >>>> *def* __init__(self, expr): >>>> self.expr = expr >>>> self.expr_ast = ast.parse(expr, mode='exec').body[0].value >>>> self.func_ast = ast.parse(first_true_str, mode='exec') >>>> self.func_ast.body[0].body[1].body[1].body[1].test = self.expr_ast >>>> self.func_cmp = compile(self.func_ast, filename="<ast>", >>>> mode="exec") >>>> *exec*(self.func_cmp, self.CONTEXT) >>>> self.func_nb = nb.njit(self.CONTEXT[self.func_ast.body[0].name]) >>>> >>>> *def* __call__(self, arr, n=1, axis=None): >>>> *# PREPARE INPUTS* >>>> in_1d = False >>>> *if* axis *is* None: >>>> arr = np.ravel(arr)[:, None] >>>> in_1d = True >>>> *elif* axis == 0: >>>> *if* arr.ndim == 1: >>>> in_1d = True >>>> arr = arr[:, None] >>>> *else*: >>>> *raise* *ValueError*('axis ~in (None, 0)') >>>> res = self.func_nb(arr, n) >>>> *if* in_1d: >>>> res = res[:, 0] >>>> *return* res >>>> >>>> >>>> *if* __name__ == '__main__': >>>> arr = np.arange(125).reshape((5, 5, 5)) >>>> ft = FirstTrue('np.sum(x) > 30') >>>> *print*(ft(arr, n=2, axis=0)) >>>> >>>> [[1 0 0 0 0] >>>> [2 1 1 1 1]] >>>> >>>> >>>> In [16]: %timeit ft(arr, 2, axis=0) >>>> 1.31 µs ± 3.94 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops >>>> each) >>>> >>>> Regards, >>>> DG >>>> >>>>> On 29 Oct 2023, at 23:18, rosko37 <rosk...@gmail.com> wrote: >>>>> >>>>> An example with a 1-D array (where it is easiest to see what I mean) is >>>>> the following. I will follow Dom Grigonis's suggestion that the range not >>>>> be provided as a separate argument, as it can be just as easily "folded >>>>> into" the array by passing a slice. So it becomes just: >>>>> idx = first_true(arr, cond) >>>>> >>>>> As Dom also points out, the "cond" would likely need to be a "function >>>>> pointer" (i.e., the name of a function defined elsewhere, turning >>>>> first_true into a higher-order function), unless there's some way to pass >>>>> a parseable expression for simple cases. A few special cases like the >>>>> first zero/nonzero element could be handled with dedicated options (sort >>>>> of like matplotlib colors), but for anything beyond that it gets unwieldy >>>>> fast. >>>>> >>>>> So let's say we have this: >>>>> ****************** >>>>> def cond(x): >>>>> return x>50 >>>>> >>>>> search_arr = np.exp(np.arange(0,1000)) >>>>> >>>>> print(np.first_true(search_arr, cond)) >>>>> ******************* >>>>> >>>>> This should print 4, because the element of search_arr at index 4 (i.e. >>>>> the 5th element) is e^4, which is slightly greater than 50 (while e^3 is >>>>> less than 50). It should return this *without testing the 6th through >>>>> 1000th elements of the array at all to see whether they exceed 50 or >>>>> not*. This example is rather contrived, because simply taking the natural >>>>> log of 50 and rounding up is far superior, not even *evaluating the array >>>>> of exponentials *(which my example clearly still does--and in the use >>>>> cases I've had for such a function, I can't predict the array elements >>>>> like this--they come from loaded data, the output of a simulation, etc., >>>>> and are all already in a numpy array). And in this case, since the values >>>>> are strictly increasing, search_sorted() would work as well. But it >>>>> illustrates the idea. >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Oct 26, 2023 at 5:54 AM Dom Grigonis <dom.grigo...@gmail.com> >>>>> wrote: >>>>>> Could you please give a concise example? I know you have provided one, >>>>>> but it is engrained deep in verbose text and has some typos in it, which >>>>>> makes hard to understand exactly what inputs should result in what >>>>>> output. >>>>>> >>>>>> Regards, >>>>>> DG >>>>>> >>>>>> > On 25 Oct 2023, at 22:59, rosko37 <rosk...@gmail.com> wrote: >>>>>> > >>>>>> > I know this question has been asked before, both on this list as well >>>>>> > as several threads on Stack Overflow, etc. It's a common issue. I'm >>>>>> > NOT asking for how to do this using existing Numpy functions (as that >>>>>> > information can be found in any of those sources)--what I'm asking is >>>>>> > whether Numpy would accept inclusion of a function that does this, or >>>>>> > whether (possibly more likely) such a proposal has already been >>>>>> > considered and rejected for some reason. >>>>>> > >>>>>> > The task is this--there's a large array and you want to find the next >>>>>> > element after some index that satisfies some condition. Such elements >>>>>> > are common, and the typical number of elements to be searched through >>>>>> > is small relative to the size of the array. Therefore, it would >>>>>> > greatly improve performance to avoid testing ALL elements against the >>>>>> > conditional once one is found that returns True. However, all built-in >>>>>> > functions that I know of test the entire array. >>>>>> > >>>>>> > One can obviously jury-rig some ways, like for instance create a "for" >>>>>> > loop over non-overlapping slices of length slice_length and call >>>>>> > something like np.where(cond) on each--that outer "for" loop is much >>>>>> > faster than a loop over individual elements, and the inner loop at >>>>>> > most will go slice_length-1 elements past the first "hit". However, >>>>>> > needing to use such a convoluted piece of code for such a simple task >>>>>> > seems to go against the Numpy spirit of having one operation being one >>>>>> > function of the form func(arr)". >>>>>> > >>>>>> > A proposed function for this, let's call it "np.first_true(arr, >>>>>> > start_idx, [stop_idx])" would be best implemented at the C code level, >>>>>> > possibly in the same code file that defines np.where. I'm wondering if >>>>>> > I, or someone else, were to write such a function, if the Numpy >>>>>> > developers would consider merging it as a standard part of the >>>>>> > codebase. It's possible that the idea of such a function is bad >>>>>> > because it would violate some existing broadcasting or fancy indexing >>>>>> > rules. Clearly one could make it possible to pass an "axis" argument >>>>>> > to np.first_true() that would select an axis to search over in the >>>>>> > case of multi-dimensional arrays, and then the result would be an >>>>>> > array of indices of one fewer dimension than the original array. So >>>>>> > np.first_true(np.array([1,5],[2,7],[9,10],cond) would return [1,1,0] >>>>>> > for cond(x): x>4. The case where no elements satisfy the condition >>>>>> > would need to return a "signal value" like -1. But maybe there are >>>>>> > some weird cases where there isn't a sensible return val >>>>>> ue, hence why such a function has not been added. >>>>>> > >>>>>> > -Andrew Rosko >>>>>> > _______________________________________________ >>>>>> > NumPy-Discussion mailing list -- numpy-discussion@python.org >>>>>> > To unsubscribe send an email to numpy-discussion-le...@python.org >>>>>> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ >>>>>> > Member address: dom.grigo...@gmail.com >>>>>> >>>>>> _______________________________________________ >>>>>> NumPy-Discussion mailing list -- numpy-discussion@python.org >>>>>> To unsubscribe send an email to numpy-discussion-le...@python.org >>>>>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ >>>>>> Member address: rosk...@gmail.com >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list -- numpy-discussion@python.org >>>>> To unsubscribe send an email to numpy-discussion-le...@python.org >>>>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ >>>>> Member address: dom.grigo...@gmail.com >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list -- numpy-discussion@python.org >>>> To unsubscribe send an email to numpy-discussion-le...@python.org >>>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ >>>> Member address: lev.maxi...@gmail.com >>> _______________________________________________ >>> NumPy-Discussion mailing list -- numpy-discussion@python.org >>> To unsubscribe send an email to numpy-discussion-le...@python.org >>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ >>> Member address: j...@fastmail.com >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list -- numpy-discussion@python.org >> To unsubscribe send an email to numpy-discussion-le...@python.org >> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ >> Member address: dom.grigo...@gmail.com > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: j...@fastmail.com >
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com