This results in a very slow code. The function calls of _pred = numba.njit(pred) are expensive and this sort of approach will be comparable to pure python functions.
This is only recommended for sourcing functions that are not called frequently, but rather have a large computational content within them. In other words not suitable for predicates. Regards, DG > On 1 Nov 2023, at 01:05, Juan Nunez-Iglesias <j...@fastmail.com> wrote: > > If you add a layer of indirection with Numba you can get a *very* nice API: > > @numba.njit > def _first(arr, pred): > for i, elem in enumerate(arr): > if pred(elem): > return i > > def first(arr, pred): > _pred = numba.njit(pred) > return _first(arr, _pred) > > This even works with lambdas! (TIL, thanks Numba devs!) > > >>> first(np.random.random(10_000_000), lambda x: x > 0.99) > 215 > > Since Numba has ufunc support I don't suppose it would be hard to make it > work with an axis= argument, but I've never played with that API myself. > > On Tue, 31 Oct 2023, at 6:49 PM, Lev Maximov wrote: >> I've implemented such functions in Cython and packaged them into a library >> called numpy_illustrated <https://pypi.org/project/numpy-illustrated/> >> >> It exposes the following functions: >> >> find(a, v) # returns the index of the first occurrence of v in a >> first_above(a, v) # returns the index of the first element in a that is >> strictly above v >> first_nonzero(a) # returns the index of the first nonzero element >> >> They scan the array and bail out immediately once the match is found. Have a >> significant performance gain if the element to be >> found is closer to the beginning of the array. Have roughly the same speed >> as alternative methods if the value is missing. >> >> The complete signatures of the functions look like this: >> >> find(a, v, rtol=1e-05, atol=1e-08, sorted=False, default=-1, raises=False) >> first_above(a, v, sorted=False, missing=-1, raises=False) >> first_nonzero(a, missing=-1, raises=False) >> >> This covers the most common use cases and does not accept Python callbacks >> because accepting them would nullify any speed gain >> one would expect from such a function. A Python callback can be implemented >> with Numba, but anyone who can write the callback >> in Numba has no need for a library that wraps it into a dedicated function. >> >> The library has a 100% test coverage. Code style 'black'. It should be easy >> to add functions like 'first_below' if necessary. >> >> A more detailed description of these functions can be found here >> <https://betterprogramming.pub/the-numpy-illustrated-library-7531a7c43ffb?sk=8dd60bfafd6d49231ac76cb148a4d16f>. >> >> Best regards, >> Lev Maximov >> >> On Tue, Oct 31, 2023 at 3:50 AM Dom Grigonis <dom.grigo...@gmail.com >> <mailto:dom.grigo...@gmail.com>> wrote: >> I juggled a bit and found pretty nice solution using numba. Which is >> probably not very robust, but proves that such thing can be optimised while >> retaining flexibility. Check if it works for your use cases and let me know >> if anything fails or if it is slow compared to what you used. >> >> >> >> first_true_str = """ >> def first_true(arr, n): >> result = np.full((n, arr.shape[1]), -1, dtype=np.int32) >> for j in range(arr.shape[1]): >> k = 0 >> for i in range(arr.shape[0]): >> x = arr[i:i + 1, j] >> if cond(x): >> result[k, j] = i >> k += 1 >> if k >= n: >> break >> return result >> """ >> >> >> class FirstTrue: >> CONTEXT = {'np': np} >> >> def __init__(self, expr): >> self.expr = expr >> self.expr_ast = ast.parse(expr, mode='exec').body[0].value >> self.func_ast = ast.parse(first_true_str, mode='exec') >> self.func_ast.body[0].body[1].body[1].body[1].test = self.expr_ast >> self.func_cmp = compile(self.func_ast, filename="<ast>", mode="exec") >> exec(self.func_cmp, self.CONTEXT) >> self.func_nb = nb.njit(self.CONTEXT[self.func_ast.body[0].name]) >> >> def __call__(self, arr, n=1, axis=None): >> # PREPARE INPUTS >> in_1d = False >> if axis is None: >> arr = np.ravel(arr)[:, None] >> in_1d = True >> elif axis == 0: >> if arr.ndim == 1: >> in_1d = True >> arr = arr[:, None] >> else: >> raise ValueError('axis ~in (None, 0)') >> res = self.func_nb(arr, n) >> if in_1d: >> res = res[:, 0] >> return res >> >> >> if __name__ == '__main__': >> arr = np.arange(125).reshape((5, 5, 5)) >> ft = FirstTrue('np.sum(x) > 30') >> print(ft(arr, n=2, axis=0)) >> >> [[1 0 0 0 0] >> [2 1 1 1 1]] >> >> >> In [16]: %timeit ft(arr, 2, axis=0) >> 1.31 µs ± 3.94 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) >> >> Regards, >> DG >> >>> On 29 Oct 2023, at 23:18, rosko37 <rosk...@gmail.com >>> <mailto:rosk...@gmail.com>> wrote: >>> >>> An example with a 1-D array (where it is easiest to see what I mean) is the >>> following. I will follow Dom Grigonis's suggestion that the range not be >>> provided as a separate argument, as it can be just as easily "folded into" >>> the array by passing a slice. So it becomes just: >>> idx = first_true(arr, cond) >>> >>> As Dom also points out, the "cond" would likely need to be a "function >>> pointer" (i.e., the name of a function defined elsewhere, turning >>> first_true into a higher-order function), unless there's some way to pass a >>> parseable expression for simple cases. A few special cases like the first >>> zero/nonzero element could be handled with dedicated options (sort of like >>> matplotlib colors), but for anything beyond that it gets unwieldy fast. >>> >>> So let's say we have this: >>> ****************** >>> def cond(x): >>> return x>50 >>> >>> search_arr = np.exp(np.arange(0,1000)) >>> >>> print(np.first_true(search_arr, cond)) >>> ******************* >>> >>> This should print 4, because the element of search_arr at index 4 (i.e. the >>> 5th element) is e^4, which is slightly greater than 50 (while e^3 is less >>> than 50). It should return this without testing the 6th through 1000th >>> elements of the array at all to see whether they exceed 50 or not. This >>> example is rather contrived, because simply taking the natural log of 50 >>> and rounding up is far superior, not even evaluating the array of >>> exponentials (which my example clearly still does--and in the use cases >>> I've had for such a function, I can't predict the array elements like >>> this--they come from loaded data, the output of a simulation, etc., and are >>> all already in a numpy array). And in this case, since the values are >>> strictly increasing, search_sorted() would work as well. But it illustrates >>> the idea. >>> >>> >>> >>> >>> On Thu, Oct 26, 2023 at 5:54 AM Dom Grigonis <dom.grigo...@gmail.com >>> <mailto:dom.grigo...@gmail.com>> wrote: >>> Could you please give a concise example? I know you have provided one, but >>> it is engrained deep in verbose text and has some typos in it, which makes >>> hard to understand exactly what inputs should result in what output. >>> >>> Regards, >>> DG >>> >>> > On 25 Oct 2023, at 22:59, rosko37 <rosk...@gmail.com >>> > <mailto:rosk...@gmail.com>> wrote: >>> > >>> > I know this question has been asked before, both on this list as well as >>> > several threads on Stack Overflow, etc. It's a common issue. I'm NOT >>> > asking for how to do this using existing Numpy functions (as that >>> > information can be found in any of those sources)--what I'm asking is >>> > whether Numpy would accept inclusion of a function that does this, or >>> > whether (possibly more likely) such a proposal has already been >>> > considered and rejected for some reason. >>> > >>> > The task is this--there's a large array and you want to find the next >>> > element after some index that satisfies some condition. Such elements are >>> > common, and the typical number of elements to be searched through is >>> > small relative to the size of the array. Therefore, it would greatly >>> > improve performance to avoid testing ALL elements against the conditional >>> > once one is found that returns True. However, all built-in functions that >>> > I know of test the entire array. >>> > >>> > One can obviously jury-rig some ways, like for instance create a "for" >>> > loop over non-overlapping slices of length slice_length and call >>> > something like np.where(cond) on each--that outer "for" loop is much >>> > faster than a loop over individual elements, and the inner loop at most >>> > will go slice_length-1 elements past the first "hit". However, needing to >>> > use such a convoluted piece of code for such a simple task seems to go >>> > against the Numpy spirit of having one operation being one function of >>> > the form func(arr)". >>> > >>> > A proposed function for this, let's call it "np.first_true(arr, >>> > start_idx, [stop_idx])" would be best implemented at the C code level, >>> > possibly in the same code file that defines np.where. I'm wondering if I, >>> > or someone else, were to write such a function, if the Numpy developers >>> > would consider merging it as a standard part of the codebase. It's >>> > possible that the idea of such a function is bad because it would violate >>> > some existing broadcasting or fancy indexing rules. Clearly one could >>> > make it possible to pass an "axis" argument to np.first_true() that would >>> > select an axis to search over in the case of multi-dimensional arrays, >>> > and then the result would be an array of indices of one fewer dimension >>> > than the original array. So >>> > np.first_true(np.array([1,5],[2,7],[9,10],cond) would return [1,1,0] for >>> > cond(x): x>4. The case where no elements satisfy the condition would need >>> > to return a "signal value" like -1. But maybe there are some weird cases >>> > where there isn't a sensible return val >>> ue, hence why such a function has not been added. >>> > >>> > -Andrew Rosko >>> > _______________________________________________ >>> > NumPy-Discussion mailing list -- numpy-discussion@python.org >>> > <mailto:numpy-discussion@python.org> >>> > To unsubscribe send an email to numpy-discussion-le...@python.org >>> > <mailto:numpy-discussion-le...@python.org> >>> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ >>> > <https://mail.python.org/mailman3/lists/numpy-discussion.python.org/> >>> > Member address: dom.grigo...@gmail.com <mailto:dom.grigo...@gmail.com> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list -- numpy-discussion@python.org >>> <mailto:numpy-discussion@python.org> >>> To unsubscribe send an email to numpy-discussion-le...@python.org >>> <mailto:numpy-discussion-le...@python.org> >>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ >>> <https://mail.python.org/mailman3/lists/numpy-discussion.python.org/> >>> Member address: rosk...@gmail.com <mailto:rosk...@gmail.com> >>> _______________________________________________ >>> NumPy-Discussion mailing list -- numpy-discussion@python.org >>> <mailto:numpy-discussion@python.org> >>> To unsubscribe send an email to numpy-discussion-le...@python.org >>> <mailto:numpy-discussion-le...@python.org> >>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ >>> <https://mail.python.org/mailman3/lists/numpy-discussion.python.org/> >>> Member address: dom.grigo...@gmail.com <mailto:dom.grigo...@gmail.com> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list -- numpy-discussion@python.org >> <mailto:numpy-discussion@python.org> >> To unsubscribe send an email to numpy-discussion-le...@python.org >> <mailto:numpy-discussion-le...@python.org> >> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ >> <https://mail.python.org/mailman3/lists/numpy-discussion.python.org/> >> Member address: lev.maxi...@gmail.com <mailto:lev.maxi...@gmail.com> >> _______________________________________________ >> NumPy-Discussion mailing list -- numpy-discussion@python.org >> <mailto:numpy-discussion@python.org> >> To unsubscribe send an email to numpy-discussion-le...@python.org >> <mailto:numpy-discussion-le...@python.org> >> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ >> <https://mail.python.org/mailman3/lists/numpy-discussion.python.org/> >> Member address: j...@fastmail.com <mailto:j...@fastmail.com> >> > > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > <mailto:numpy-discussion@python.org> > To unsubscribe send an email to numpy-discussion-le...@python.org > <mailto:numpy-discussion-le...@python.org> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > <https://mail.python.org/mailman3/lists/numpy-discussion.python.org/> > Member address: dom.grigo...@gmail.com <mailto:dom.grigo...@gmail.com>
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com