On Thu, 2018-04-26 at 19:26 +0200, Sebastian Berg wrote: > On Thu, 2018-04-26 at 09:51 -0700, Hameer Abbasi wrote: > > Hi Nathan, > > > > np.any and np.all call np.or.reduce and np.and.reduce respectively, > > and unfortunately the underlying function (ufunc.reduce) has no way > > of detecting that the value isn’t going to change anymore. It’s > > also > > used for (for example) np.sum (np.add.reduce), np.prod > > (np.multiply.reduce), np.min(np.minimum.reduce), > > np.max(np.maximum.reduce). > > > I would like to point out that this is not almost, but not quite > true. > The boolean versions will short circuit on the innermost level, which > is good enough for all practical purposes probably. > > One way to get around it would be to use a chunked iteration using > np.nditer in pure python. I admit it is a bit tricky to get start on, > but it is basically what numexpr uses also (at least in the simplest > mode), and if your arrays are relatively large, there is likely no > real > performance hit compared to a non-pure python version. >
I mean something like this:
def check_any(arr, func=lambda x: x, buffersize=0):
"""
Check if the function is true for any value in arr and stop once the first
was found.
Parameters
----------
arr : ndarray
Array to test.
func : function
Function taking a 1D array as argument and returning an array (on which
``np.any``
will be called.
buffersize : int
Size of the chunk/buffer in the iteration, zero will use the default
numpy value.
Notes
-----
The stopping does not occur immediatly but in buffersize chunks.
"""
iterflags = ['buffered', 'external_loop', 'refs_ok', 'zerosize_ok']
for chunk in np.nditer((arr,), flags=iterflags, buffersize=buffersize):
if np.any(func(chunk)):
return True
return False
not sure how it performs actually, but you can give it a try especially
if you know you have large arrays, or if "func" is pretty expensive.
If the input is already bool, it will be quite a bit slower though I am
sure.
- Sebastian
> - Sebastian
>
>
>
> >
> > You can find more information about this on the ufunc doc page. I
> > don’t think it’s worth it to break this machinery for any and all,
> > as
> > it has numerous other advantages (such as being able to override in
> > duck arrays, etc)
> >
> > Best regards,
> > Hameer Abbasi
> > Sent from Astro for Mac
> >
> > > On Apr 26, 2018 at 18:45, Nathan Goldbaum <[email protected]>
> > > wrote:
> > >
> > > Hi all,
> > >
> > > I was surprised recently to discover that both np.any and
> > > np.all()
> > > do not have a way to exit early:
> > >
> > > In [1]: import numpy as np
> > >
> > > In [2]: data = np.arange(1e6)
> > >
> > > In [3]: print(data[:10])
> > > [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
> > >
> > > In [4]: %timeit np.any(data)
> > > 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000
> > > loops
> > > each)
> > >
> > > In [5]: data = np.zeros(int(1e6))
> > >
> > > In [6]: %timeit np.any(data)
> > > 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000
> > > loops
> > > each)
> > >
> > > I don't see any discussions about this on the NumPy issue tracker
> > > but perhaps I'm missing something.
> > >
> > > I'm curious if there's a way to get a fast early-terminating
> > > search
> > > in NumPy? Perhaps there's another package I can depend on that
> > > does
> > > this? I guess I could also write a bit of cython code that does
> > > this but so far this project is pure python and I don't want to
> > > deal with the packaging headache of getting wheels built and
> > > conda-
> > > forge packages set up on all platforms.
> > >
> > > Thanks for your help!
> > >
> > > -Nathan
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [email protected]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [email protected]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/numpy-discussion
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list [email protected] https://mail.python.org/mailman/listinfo/numpy-discussion
