On Mon, 2022-01-10 at 15:25 -0800, Stephan Hoyer wrote:
> There are no C-level APIs for __array_function__ or __array_ufunc__,
> so
> yes, at a high-level Python methods will be invoked by NumPy.
> 
> That said, NumPy's logic for handling __array_function__ and
> __array_ufunc__ methods is written in highly optimized C. If you
> wrote your
> own __array_function__ and __array_ufunc__ methods on Quantity using
> C, and
> there should be very little overhead from Python. I would guess you
> might
> see a considerable speed-ups due to moving highly dynamic Python
> logic into
> a compiled language.
> 

There are some slowish Python-C interface boundaries that could be
optimized away [1], although we should first do the generic
optimizations that I mentioned before the break (at least for
`__array_function__`) before considering that.

Another approach is now becoming available (and experimenting on it
could hasten it a lot), though, if that interests you.

That is to use new user defined DTypes:  If you use a dtype that holds
the unit, there is no wrapping of the NumPy array at all!
If done right, you stay within C and the overhead should be effectively
indistinguishable from normal NumPy array operations.
There is a very minimal proof of concept here (not designed to be fast
or good):

    https://github.com/seberg/experimental_user_dtypes

This works with NumPy main branch (albeit there is a small tweak needed
right now [2]).

I was hoping to look into a somewhat more full featured prototype
soonish (the fact that it uses Cython turned out a bad idea).

Currently, there is mainly one larger feature missing:  A Unit DType
should be able to call the existing NumPy loops conveniently (e.g. use
the normal multiplication function but tag on unit handling).

Cheers,

Sebastian


[1] Basically, just hopping back from C to Python (even if just to call
C) is a bit sluggish when you look at small to mid-sized array
operations.  That could probably be avoided, but needs small API
extension I think.  I am not sure how worthwhile that is.

[2] Unfortunately, I a merge conflict or so happened. So if you want to
try that example code, you currently need to change one line in
`numpy/include/numpy/experimental_dtype_api.h` to read:

#define __EXPERIMENTAL_DTYPE_VERSION 3

(The version should be 3, not 2.)

> 
> On Mon, Jan 10, 2022 at 12:56 PM Juan Luis Cano Rodríguez <
> hello@juanlu.space> wrote:
> 
> > Hi all,
> > 
> > I am a long time user of astropy.units, which allows one to define
> > quantities with physical units as follows:
> > 
> > > > > from astropy import units as u
> > > > > 10 << u.cm
> > <Quantity 10. cm>
> > > > > np.sqrt(4 << u.m ** 2)
> > <Quantity 2. m>
> > > > > ([1, 1, 0] << u.m) @ ([0, 10, 20] << u.cm / u.s)
> > <Quantity 10. cm m / s>
> > > > > (([1, 1, 0] << u.m) * ([0, 10, 20] << u.cm / u.s)).to(u.m **
> > > > > 2 / u.s)
> > <Quantity [0. , 0.1, 0. ] m2 / s>
> > 
> > The mechanism works by subclassing numpy.ndarray and leveraging
> > __array_function__ support aka NEP 18. Internally it is something
> > like this:
> > 
> > > > > v = np.array(10, dtype=np.float64, copy=False, order=None,
> > > > > subok=True,
> > ndmin=0)
> > > > > vu = v.view(u.Quantity)
> > > > > vu._set_unit(u.cm)
> > > > > vu
> > <Quantity 10. cm>
> > 
> > However, over the years I have been constantly annoyed by the fact
> > that it
> > is tremendously slow. I'm not critizing Astropy devs, the problem
> > seems
> > objectively difficult: although some code paths could be optimized
> > at the
> > cost of losing some syntactic sugar or breaking backwards
> > compatibility,
> > `isinstance` calls and introspection in general are slow.
> > 
> > Setting aside the question of trying to make astropy.units faster
> > (which
> > may or may not be possible), I was thinking how feasible could it
> > be to
> > implement something similar, but using a compiled language instead
> > (C,
> > Cython, Rust, whatever) and leveraging "modern" dispatch
> > mechanisms. But
> > after reading about NEP 18, NEP 47, uarray, and various pull
> > requests and
> > issues here and there (
> > https://labs.quansight.org/blog/2021/11/pydata-extensibility-vision/
> >  and
> > https://github.com/scipy/scipy/issues/10204#issuecomment-787067947 
> > among
> > others) I don't fully grasp the differences between the approaches,
> > and I
> > don't know if what I am proposing is feasible at all. Since IIUC
> > the numpy
> > function or ufunc is passed to __array_function__ and
> > __array_ufunc__
> > respectively, I am not sure how would that interact with the code
> > being in
> > a foreign language (I assume the NumPy C API would have to be
> > used).
> > 
> > If folks have advice, ideas, or suggestions for a direction, I'll
> > be happy
> > to read them.
> > 
> > Best!
> > _______________________________________________
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: sho...@gmail.com
> > 
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to