[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

Dom Grigonis Tue, 22 Aug 2023 18:38:52 -0700

I don’t have an issue with cumsum0 if it is approached as a request for a 
useful utility function.


But arguing that this is what a cumulative sum function should be doing is a 
very big stretch. Cumulative sum has its foundational meaning and purpose which 
is clearly reflected in its name, which is not to solve fencepost error, but to 
accumulate the summation sequence. Prepending 0 as part of it feels very 
unnatural. It is simply extra operation.

diff0, in my opinion, has a bit more intuitive sense to it, but obviously there 
is no need to add it if no one else needs/uses it.


> On 22 Aug 2023, at 17:36, john.daw...@camlingroup.com wrote:
> 
> Dom Grigonis wrote:
>> 1. Dimension length stays constant, while cumusm0 extends length to n+1, 
>> then np.diff, truncates it back. This adds extra complexity, while things 
>> are very convenient to work with when dimension length stays constant 
>> throughout the code.
> 
> For n values there are n-1 differences. Equivalently, for k differences there 
> are k+1 values. Herefor, `diff` ought to reduce length by 1 and `cumsum` 
> ought to increase it by 1. Returning arrays of the same length is a fencepost 
> error. This is a problem in the current behaviour of `cumsum` and the 
> proposed behaviour of `diff0`.

diff0 doesn’t solve the error in a strict sense. However, the first value of 
diff0 result becomes the starting point from which to count remaining 
differences, so with the right approach it does solve the issue - if starting 
values are subtracted then it is doing the same thing, just in different order. 
See below:

> 
> ------------------------------------------------------------
> EXAMPLE
> 
> Consider a path given by a list of points, say (101, 203), (102, 205), (107, 
> 204) and (109, 202). What are the positions at fractions, say 1/3 and 2/3, 
> along the path (linearly interpolating)?
> 
> The problem is naturally solved with `diff` and `cumsum0`:
> 
> ```
> import numpy as np
> from scipy import interpolate
> 
> positions = np.array([[101, 203], [102, 205], [107, 204], [109, 202]], 
> dtype=float)
> steps_2d = np.diff(positions, axis=0)
> steps_1d = np.linalg.norm(steps_2d, axis=1)
> distances = np.cumsum0(steps_1d)
> fractions = distances / distances[-1]
> interpolate_at = interpolate.make_interp_spline(fractions, positions, 1)
> interpolate_at(1/3)
> interpolate_at(2/3)
> ```
> 
> Please show how to solve the problem with `diff0` and `cumsum`.
> ------------------------------------------------------------

positions = np.array([[101, 203], [102, 205], [107, 204], [109, 202]], 
dtype=float)
positions_rel = positions - positions[0, None]
steps_2d = diff0(positions_rel, axis=0)
steps_1d = np.linalg.norm(steps_2d, axis=1)
distances = np.cumsum(steps_1d)
fractions = distances / distances[-1]
interpolate_at = interpolate.make_interp_spline(fractions, positions, 1)
print(interpolate_at(1/3))
print(interpolate_at(2/3))
> ------------------------------------------------------------
> EXAMPLE
> 
> Money is invested on 2023-01-01. The annualized rate is 4% until 2023-02-04 
> and 5% thence until 2023-04-02. By how much does the money multiply in this 
> time?
> 
> The problem is naturally solved with `diff`:
> 
> ```
> import numpy as np
> 
> percents = np.array([4, 5], dtype=float)
> times = np.array(["2023-01-01", "2023-02-04", "2023-04-02"], 
> dtype=np.datetime64)
> durations = np.diff(times)
> YEAR = np.timedelta64(365, "D")
> multipliers = (1 + percents / 100) ** (durations / YEAR)
> multipliers.prod()
> ```
> 
> Please show how to solve the problem with `diff0`. It makes sense to divide 
> `np.diff(times)` by `YEAR`, but it would not make sense to divide the output 
> of `np.diff0(times)` by `YEAR` because of its incongruous initial value.
> ------------------------------------------------------------
In my experience it is more sensible to use time series approach, where the 
whole path of investment is calculated. For modelling purposes, analysis and 
presentation to clients single code can then be used. I would do it like:
r = np.log(1 + np.array([0, 0.04, 0.05]))
start_date = np.array("2023-01-01", dtype=np.datetime64)
times = np.array(["2023-01-01", "2023-02-04", "2023-04-02"], 
dtype=np.datetime64)
t = (times - start_date).astype(float) / 365
dt = diff0(t)
normalised = np.exp(np.cumsum(r * dt))
# PLOT
s0 = 1000
plt.plot(s0 * normalised)

Apart from responses above, diff0 is useful in data analysis. Indices and 
observations usually have the same length. It is always convenient to keep it 
that way and it makes a nice, clean and simple code.
t = dates
s = observations
# Plot changes:
ds = diff0(s)
plt.plot(dates, ds)
# 2nd order changes
plt.plot(dates, diff0(ds))
# Moving average of changes
plt.plot(dates, bottleneck.move_mean(ds, 3))

> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: dom.grigo...@gmail.com

_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

Reply via email to