On Mon, Jul 10, 2023 at 1:49 AM Matti Picus <matti.pi...@gmail.com> wrote:

> On 9/7/23 23:34, glaserj--- via NumPy-Discussion wrote:
>
> > Reviving this old thread - I note that numpy.dot supports in-place
> > computation for performance reasons like this
> > c = np.empty_like(a, order='C')
> > np.dot(a, b, out=c)
> >
> > However, the data type of the pre-allocated c array must match the
> result datatype of a times b. Now, with some accelerator hardware (i.e.
> tensor cores or matrix multiplication engines in GPUs), mixed precision
> arithmetics with relaxed floating point precision (i.e.., which are not
> necessarily IEEE754 conformant) but with faster performance are possible,
> which could be supported in downstream libraries such as cupy.
> >
> > Case in point, a mixed precision calculation may take half precision
> inputs, but accumulate in and return full precision outputs. Due to the
> above mentioned type consistency, the outputs would be unnecessarily
> demoted (truncated) to half precision again. The current API of numpy does
> not expose mixed precision concepts. Therefore, it would be nice if it was
> possible to build in support for hardware accelerated linear algebra, even
> if that may not be available on the standard (CPU) platforms numpy is
> typically compiled for.
> >
> > I'd be happy to flesh out some API concepts, but would be curious to
> first get an opinion from others. It may be necessary to weigh the
> complexity of adding such support explicitly against providing minimal
> hooks for add-on libraries in the style of JMP (for jax.numpy), or AMP (for
> torch).
> >
> > Jens
>
> If your goal is "accumulate in and return full precision outputs", then
> you can allocate C as the full precision type, and NumPy should do the
> right thing. Note it may convert the entire input array to the final
> dtype, rather than doing it "on the fly" which could be expensive in
> terms of memory.
>

In this case, no, `np.dot()` will raise an exception if it's not the dtype
that `np.dot(a, b)` would naturally produce. It's different from the
ufuncs, which do indeed behave like you describe. `np.dot()` implements its
own dtype coercion logic.

Jens, there's nothing that really prevents adding mixed precision
operations. ufuncs let you provide loops with mixed dtypes, mostly to
support special functions that take both integer and real arguments (or
real and complex). But one could conceivably put in mixed-precision loops.
For non-ufuncs like `np.dot()` that implement their own dtype-coercion
logic, it's just a matter of coding it up. The main reasons that we don't
are that it's a lot of work for possibly marginal gain to put in all of the
relevant permutations and that it complicates an already overly-complicated
and not-fully-coherent type promotion scheme.

`np.dot()` is kind of an oddball already, and "half-precision inputs ->
full-precision outputs" might be a worthwhile use case given hardware
accelerators. Given that this largely affects non-numpy implementations of
the Array API, you probably want to raise it with that group. numpy can
implement that logic if the Array API requires it.

-- 
Robert Kern
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to