The following write up was triggered by issue #7265
<https://github.com/numpy/numpy/issues/7265>, you may want to review that
discussion for context:

In the docs, ufunc reduction operations are explained as "cumulatively
applying the ufunc along an axis." This basically limits just about any
reduction operation to ufuncs with two inpus and one output. Another take
on the same reduction idea is that, if the inputs are a and b and the
output is c, c is the result of updating a with the value in b. With this
idea in mind, in e.g.

    add(a, b) -> c = a + b,

c would be thought of as the updated value of a, after applying b to it.
One would expect that, for any registered loop suitable for this
interpretation, the types of a and c would be identical, but b could be of
some other type. As an example consider

    count(a, b) -> c = a + 1,

where a and c would typically be intp, but b could have just about any type.

The power of this description of ufuncs suited for reduction is that it is
very easy to generalize beyond the current "two inputs, one output"
restriction. E.g. a "sum of squared differences" ufunc defined as:

    ssqd(sd2, x, y) -> sd2 += (x - y)**2,

could be used to compute squared euclidean distances between vectors doing:

    ssqd.reduce((x, y)),

without the memory overhead of current available approaches.

In general, a reduction of a ufunc with m inputs and n outputs, m > n,
would produce n results out of m - n inputs. Such a ufunc would have to
define a suitable identity to initialize each of those m - n outputs at the
beginning of any reduction, rather than the single identity ufuncs now hold.

It can be argued that this generalization of reduction is just a
redefinition of a subset of what gufuncs can do, e.g. a "sum of squared
differences" gufunc with signature (n),(n)->() would do the same as the
above reduction, probably faster. And it is not clear that getting
accumulate and reduceat for free, or reduction over multiple axes,
justifies the extra complexity. Especially since it is likely that
something similarly powerful could be built on top of gufuncs.

Where an approach such as this would shine is if we had a reduction method
that did not require a single strided memory segment to act on. Setting
aside the generalization of reduction described above, a groupby or reduceby
method would take an array of values and an array of groups, and compute a
reduction value for each of the unique groups. With the above
generalizations, one could compute, e.g. the variance over each group by
doing something like:

    # how exactly does keepdims work here is not clear at all!
    counts = count.reduceby(values, groups, keepdims=True)
    mean = np.add.reduceby(values, groups, keepdims=True) / counts
    var = np.ssqd.reduceby((values, mean), groups) / counts

I am not fully sure of whether this is just useful for this particular
little example of computing variance using a ufunc method that doesn't
exist, or if it is valid in other settings too. Implementing this would add
complexity to an already complex system, so it better be good for something!

One of the weakest points I see is the need to rely on a set of predefined
identities to start the reduction. Think of trying to generalize reduction
to gufuncs, another worthy effort, and take matrix multiplication with
signature (i,j),(j,k)->(i,k). In the current paradigm you would impose that
both inputs and the output parameters be interchangeable, which leads
rather naturally to the condition i = j = k for this reduction to be
doable. But with the new paradigm you only need that the first input and
output be interchangeable, which only provides you with j = k as a
condition. And while this is a more general behavior, the questions of what
value to set i to in the output, and what to init the output to, when
trying to do a reduction over a stack of square arrays, make it fairly
unusable. So a backup to the "two inputs, one output, initialize to some
item in the reduction array" would have to be kept in place

It is also important to note that expanding reduction to gufuncs would
probably require that we impose some iteration order guarantees on
ourselves, as the poster child for this (matrix multiplication) is not in
general a commutative operation. Would we want to do this to ourselves?
Would the features today justify the burden for ever after?

Any thoughts on where, if anywhere, to go with this, are very welcome!

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to