[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

Dom Grigonis Sat, 12 Aug 2023 05:08:15 -0700

From my point of view, such function is a bit of a corner-case to be added to 
numpy. And it doesn’t justify it’s naming anymore. It is not one operation 
anymore. It is a cumsum and prepending 0. And it is very difficult to argue why 
prepending 0 to cumsum is a part of cumsum.


What I would rather vouch for is adding an argument to `np.diff` so that it 
leaves first row unmodified.
def diff0(a, axis=-1):
    """Differencing which appends first item along the axis"""
    a0 = np.take(a, [0], axis=axis)
    return np.concatenate([a0, np.diff(a, n=1, axis=axis)], axis=axis)
This would be more sensible from conceptual point of view. As difference can 
not be made, the result is the difference from absolute origin. With 
recognition that first non-origin value in a sequence is the one after it. And 
if the first row is the origin in a specific case, then that origin is 
correctly defined in relation to absolute origin.

Then, if origin row is needed, then it can be prepended in the beginning of a 
procedure. And np.diff and np.cumsum are inverses throughout the sequential 
code.

np.diff0 was one the first functions I had added to my numpy utils and been 
using it instead of np.diff quite a lot.

I think general flag to prevent fencepost errors could be added to all 
functions, where required, so that the flow is seamless retains initial 
dimension length. Taking some time to ensure consistency across numpy in this 
dimension could be of long term value.

E.g. rolling functions in numbagg and bottleneck leave nans, because there is 
no other sensible value to go there instead. While in this case, sensible value 
exists. Just not in `cumsum` function.

> On 11 Aug 2023, at 15:53, Juan Nunez-Iglesias <j...@fastmail.com> wrote:
> 
> I'm very sensitive to the issues of adding to the already bloated numpy API, 
> but I would definitely find use in this function. I literally made this error 
> (thinking that the first element of cumsum should be 0) just a couple of days 
> ago! What are the plans for the "extended" NumPy API after 2.0? Is there a 
> good place for these variants?
> 
> On Fri, 11 Aug 2023, at 2:07 AM, john.daw...@camlingroup.com wrote:
>> `cumsum` computes the sum of the first k summands for every k from 1. 
>> Judging by my experience, it is more often useful to compute the sum of 
>> the first k summands for every k from 0, as `cumsum`'s behaviour leads 
>> to fencepost-like problems.
>> https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error
>> For example, `cumsum` is not the inverse of `diff`. I propose adding a 
>> function to NumPy to compute cumulative sums beginning with 0, that is, 
>> an inverse of `diff`. It might be called `cumsum0`. The following code 
>> is probably not the best way to implement it, but it illustrates the 
>> desired behaviour.
>> 
>> ```
>> def cumsum0(a, axis=None, dtype=None, out=None):
>>    """
>>    Return the cumulative sum of the elements along a given axis,
>>    beginning with 0.
>> 
>>    cumsum0 does the same as cumsum except that cumsum computes the sum
>>    of the first k summands for every k from 1 and cumsum, from 0.
>> 
>>    Parameters
>>    ----------
>>    a : array_like
>>        Input array.
>>    axis : int, optional
>>        Axis along which the cumulative sum is computed. The default
>>        (None) is to compute the cumulative sum over the flattened
>>        array.
>>    dtype : dtype, optional
>>        Type of the returned array and of the accumulator in which the
>>        elements are summed. If `dtype` is not specified, it defaults to
>>        the dtype of `a`, unless `a` has an integer dtype with a
>>        precision less than that of the default platform integer. In
>>        that case, the default platform integer is used.
>>    out : ndarray, optional
>>        Alternative output array in which to place the result. It must
>>        have the same shape and buffer length as the expected output but
>>        the type will be cast if necessary. See
>>        :ref:`ufuncs-output-type` for more details.
>> 
>>    Returns
>>    -------
>>    cumsum0_along_axis : ndarray.
>>        A new array holding the result is returned unless `out` is
>>        specified, in which case a reference to `out` is returned. If
>>        `axis` is not None the result has the same shape as `a` except
>>        along `axis`, where the dimension is smaller by 1.
>> 
>>    See Also
>>    --------
>>    cumsum : Cumulatively sum array elements, beginning with the first.
>>    sum : Sum array elements.
>>    trapz : Integration of array values using the composite trapezoidal rule.
>>    diff : Calculate the n-th discrete difference along given axis.
>> 
>>    Notes
>>    -----
>>    Arithmetic is modular when using integer types, and no error is
>>    raised on overflow.
>> 
>>    ``cumsum0(a)[-1]`` may not be equal to ``sum(a)`` for floating-point
>>    values since ``sum`` may use a pairwise summation routine, reducing
>>    the roundoff-error. See `sum` for more information.
>> 
>>    Examples
>>    --------
>>>>> a = np.array([[1, 2, 3], [4, 5, 6]])
>>>>> a
>>    array([[1, 2, 3],
>>           [4, 5, 6]])
>>>>> np.cumsum0(a)
>>    array([ 0,  1,  3,  6, 10, 15, 21])
>>>>> np.cumsum0(a, dtype=float)  # specifies type of output value(s)
>>    array([ 0.,  1.,  3.,  6., 10., 15., 21.])
>> 
>>>>> np.cumsum0(a, axis=0)  # sum over rows for each of the 3 columns
>>    array([[0, 0, 0],
>>           [1, 2, 3],
>>           [5, 7, 9]])
>>>>> np.cumsum0(a, axis=1)  # sum over columns for each of the 2 rows
>>    array([[ 0,  1,  3,  6],
>>           [ 0,  4,  9, 15]])
>> 
>>    ``cumsum(b)[-1]`` may not be equal to ``sum(b)``
>> 
>>>>> b = np.array([1, 2e-9, 3e-9] * 1000000)
>>>>> np.cumsum0(b)[-1]
>>    1000000.0050045159
>>>>> b.sum()
>>    1000000.0050000029
>> 
>>    """
>>    empty = a.take([], axis=axis)
>>    zero = empty.sum(axis, dtype=dtype, keepdims=True)
>>    later_cumsum = a.cumsum(axis, dtype=dtype)
>>    return concatenate([zero, later_cumsum], axis=axis, dtype=dtype, out=out)
>> ```
>> _______________________________________________
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: j...@fastmail.com
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: dom.grigo...@gmail.com

_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

Reply via email to