Re: [PR] Replacing signal_processing_algorithms with internal implementation [otava]

via GitHub Thu, 13 Nov 2025 16:39:40 -0800


henrikingo commented on code in PR #96:
URL: https://github.com/apache/otava/pull/96#discussion_r2524996795



##########
otava/change_point_divisive/base.py:
##########
@@ -0,0 +1,109 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from dataclasses import dataclass, fields
+from typing import Generic, List, Optional, TypeVar
+
+from numpy.typing import NDArray
+
+
+@dataclass
+class CandidateChangePoint:
+    '''Candidate for a change point. The point that maximizes Q-hat function 
on [start:end+1] slice'''
+    index: int

Review Comment:
   Maybe it would be clearer if somewhere you also use either:
   
      start < index <= end
   
   or the equivalent
   
      )start, end)
   
   and then say, "...which correspond to the slice [start:end+1] in python, as 
well as range(start,end+1)
   
   Could add a mention that indexes are always the original index from the full 
series [0,T] and not zero-based for each interval. (`start < index <= end`,  
never `0 < index <= end-start`
   
   A change point at index 0 is impossible, because a single point just cannot 
change But this IMO follows logically, it is not a pre-condition. (There's a 
fun philosophical debate here, where exactly the change points are? In the case 
of a series of git commits, it is clear that the change point is the commit 
that causes the regression/improvement. Others could argue change is what 
happens in the gaps between the points of measurement.) In Otava we index 
change points from 1 to T, because this way they match with their corresponding 
test result in the input series. So you could make the conclusion Otava is in 
the camp that change happens, or is observed at least, at the point after the 
change.
   
   Anyway, should we add an invariant here that index > 0?



##########
otava/analysis.py:
##########
@@ -170,10 +126,42 @@ def compare(self, left: np.ndarray, right: np.ndarray) -> 
ComparativeStats:
             )
         else:
             p = 1.0
-        return ComparativeStats(mean_l, mean_r, std_l, std_r, p)
+        return TTestStats(mean_1=mean_l, mean_2=mean_r, std_1=std_l, 
std_2=std_r, pvalue=p)
+
+    def change_point(
+        self, candidate: CandidateChangePoint, series: 
Sequence[SupportsFloat], intervals: List[slice]
+    ) -> ChangePoint[TTestStats]:
+        """
+        Computes properties of the change point if the Candidate Change Point 
based on the provided intervals.
+
+        The method works for two cases:
+        1. Divisive algorithm.
+           if the candidate is a new potential change point, i.e., its index 
is inside any interval, then
+           we split the interval by the candidate's index to get left and 
right subseries.
+        2. Merge step in t-test algorithm.

Review Comment:
   Terminology: I would say any variant of the algorithm can use a T-Test, 
Permutation test, or something else. The Merge step is part of what I'd call 
split-merge strategy, or perhaps weak change points version. I guess the Hunter 
paper just called this the Fixed size windows, but I could think of many kinds 
of windows (such as sliding) that don't need to be merged. And this is at least 
closely related to the weak change points, but I'm unsure whether weak change 
points will be needed now? Otoh maybe we'll soon get rid of the split-merge 
too, in which case we can ignore this discussion.



##########
otava/analysis.py:
##########
@@ -259,13 +247,13 @@ def split(series: np.array, window_len: int = 30, 
max_pvalue: float = 0.001,
     assert "Window length must be at least 2", window_len >= 2
     start = 0
     step = int(window_len / 2)
-    indexes = []
+    change_points = []
     # N new_points are appended to the end of series. Typically N=1.
     # old_cp are the weak change points from before new points were added.
     # We now just compute e-e_divisive for the tail of the series, beginning at
     # max(old_cp[-1], a step that is over 2 window_len from the end)
     if new_points is not None and old_cp is not None:
-        indexes = [c.index for c in old_cp]
+        change_points = old_cp[:]

Review Comment:
   I like this change (throughout)



##########
otava/analysis.py:
##########
@@ -222,22 +210,22 @@ def merge(
         # We can't continue yet, because by removing a change_point
         # the adjacent change points changed their properties.
         # Recompute the adjacent change point stats:
-        window_endpoints = [0] + [cp.index for cp in change_points] + 
[len(series)]
+        intervals = tester.get_intervals(change_points)
 
         def recompute(index: int):
             if index < 0 or index >= len(change_points):
                 return
             cp = change_points[index]
-            change_points[index] = tester.change_point(cp.index, series, 
window_endpoints)
+            change_points[index] = tester.change_point(cp.to_candidate(), 
series, intervals)
 
         recompute(weakest_cp_index)
         recompute(weakest_cp_index + 1)

Review Comment:
   Do we test for the case where weakest_cp_index == max(index)



##########
otava/change_point_divisive/calculator.py:
##########
@@ -0,0 +1,189 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+import numpy as np
+from numpy.typing import NDArray
+
+from otava.change_point_divisive.base import Calculator, CandidateChangePoint
+
+
+class PairDistanceCalculator(Calculator):
+    def __init__(self, series: NDArray, power: int = 1):
+        super().__init__(series)
+        assert 0 < power < 2, f"power={power} isn't in (0, 2)"
+        self.power = power
+        self.V = None
+        self.H = None
+
+    def _calculate_distances(self):
+        '''Precomputes `H` and `V` functions that are used for computation of 
`Q` function.
+            See more details about `Q` function in 
`get_candidate_change_point` method.
+            See more details about the use of `H` and `V` functions in 
`_get_A_B_C` method.
+
+        Let matix `distances` be a matrix of all possible L1 (Manhattan) 
distances between

Review Comment:
   matrix



##########
otava/change_point_divisive/calculator.py:
##########
@@ -0,0 +1,189 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+import numpy as np
+from numpy.typing import NDArray
+
+from otava.change_point_divisive.base import Calculator, CandidateChangePoint
+
+
+class PairDistanceCalculator(Calculator):
+    def __init__(self, series: NDArray, power: int = 1):
+        super().__init__(series)
+        assert 0 < power < 2, f"power={power} isn't in (0, 2)"
+        self.power = power
+        self.V = None
+        self.H = None
+
+    def _calculate_distances(self):
+        '''Precomputes `H` and `V` functions that are used for computation of 
`Q` function.
+            See more details about `Q` function in 
`get_candidate_change_point` method.
+            See more details about the use of `H` and `V` functions in 
`_get_A_B_C` method.
+
+        Let matix `distances` be a matrix of all possible L1 (Manhattan) 
distances between
+        elements of 1d array `series`. If `D(i, j) = |series[i] - series[j]|` 
and `N = len(series)`,
+        then
+
+                      | D(0, 0)        D(0, 1)      …       D(0, N - 1) |
+                      | D(1, 0)        D(1, 1)      …       D(1, N - 1) |
+        distances =   |    ⋮              ⋮          ⋱         ⋮        | .
+                      | D(N - 1, 0)    D(N - 1, 1)  …   D(N - 1, N - 1) |
+
+        Note that the main diagonal of the matrix is filled with 0.
+
+
+        We define functions `V` and `H` as the following (V)ertical and 
(H)orizontal
+        cummulative sums:
+                        V(start, τ, κ) = Σ distances[start:τ, τ],
+                        H(start, τ, κ) = Σ distances[τ, τ:κ],
+        for `start < τ < κ <= N`.
+        Note that `V(start, τ, κ) = V(start, τ)` and `H(start, τ, κ) = H(τ, 
κ)`.
+
+
+        Vector V contains the following values of function `V(start, τ, κ)`:
+                            V = [V(0, 1, κ)     V(0, 2, κ),    …  V(0, N - 1, 
κ)].
+            `V(0, 0, κ) = 0` so they are ommited. As noted, function `V(start, 
τ, κ)`
+            does not depend on `κ`. Therefore, vector V contains all values of 
function
+            `V(0, τ, κ)`. We can easily get values of function `V` for an 
arbitrary
+            value of `start` by
+                            V(start, τ, κ) = V(0, τ, κ) - Σ distances[0 : 
start, τ].
+        Note: The reason not all values of `V(start, τ, κ)` are precomputed is 
that
+              we do not need them all. The values of `start` will depend on 
critical
+              points we find. Precomuting them for all possible values is a 
waste.
+
+
+        Matrix H contains the following values of function `H(start, τ, κ)`:
+
+                  | H(start, 0, 2)    H(start, 0, 3)   …  H(start, 0, N)     |
+                  |          0        H(start, 1, 3)   …  H(start, 1, N)     |
+            H =   |          ⋮                 ⋮        ⋱          ⋮         | 
.
+                  |          0                 0       …  H(start, N - 2, N) |
+
+            `H(start, x, x) = H(start, x - 1, x) = 0` for any `x` so they are 
ommited.
+            As noted, function `H(start, τ, κ)` does not depend on `start`. 
Therefore,
+            matrix H contains all possible values of function `H`.
+        Note: We precomputed all values of `H(start, τ, κ)` because all of 
them are needed
+              for the very first iteration (`start=0` and `end=N-1`).'''
+        self.distances = np.power(np.abs(self.series[:, None] - 
self.series[None, :]), self.power)
+        triu = np.triu(self.distances, k=1)[:-1, 1:]
+        self.V = triu.sum(axis=0)
+        self.H = triu.cumsum(axis=1)
+
+    def _get_Q_vals(self, start: int, end: int) -> NDArray:
+        '''Computes matrices A, B, C where all possible values of function `Q` 
are
+        given by matrix Q = A - B - C.
+        See more details about `Q` function in `get_candidate_change_point` 
method.
+
+        For any given `start` and `end` let
+                        Q(τ, κ) = A(start, τ, κ) - B(start, τ, κ) - C(start, 
τ, κ),
+        where `start < τ < κ <= end + 1`. (For definitions of `A`, `B`, `C` see
+        formula of `Q` in `get_candidate_change_point` method.) All possible 
values of function
+        `A` are given by matrix
+
+              | A(start, start + 1, start + 2)    A(start, start + 1, start + 
3)   …  A(start, start + 1, end + 1) |
+              |                0                  A(start, start + 2, start + 
3)   …  A(start, start + 2, end + 1) |
+        A =   |                ⋮                                  ⋮            
     ⋱                 ⋮            | .
+              |                0                                  0            
    …  A(start, end, end + 1)       |
+
+        Matrices B and C follow the same index-to-value pattern.
+
+        Matrices A, B, and C are used to compute values of function `Q` 
recursively, without
+        recomputing the same sums over and over again. The recursive formulas 
were further
+        transformed to closed forms, so they can be computed using numpy 
cummulative sum
+        function to take advantage of numpy vectorized operations. (Note that 
each column
+        in matrices A, B, C can be repersented through np.cumsum). The 
formulas use
+        commulative sum of functions `H` and `V`, which definitions can be 
found in
+        `_calculate_distances` method.
+
+            A(start, τ, κ) = 2 / (κ - s) * (Σt=start,τ-1  H(start, t, κ) - 
Σt=start+1,τ-1  V(start, t, κ)),
+
+            B(start, τ, κ) = 2 * (κ - τ) / (κ - start) / (τ - start - 1) * 
Σt=start+1,τ-1  V(start, t, κ),
+
+            C(start, τ, κ) = 2 * (τ - start) / (κ - start) / (κ - τ - 1) * 
Σt=τ,κ-2  H(start, t, κ).'''
+        if self.V is None or self.H is None:
+            self._calculate_distances()
+
+        V = self.V[start : end] - self.distances[0 : start, start + 1 : end + 
1].sum(axis=0)
+        H = self.H[start : end, start : end]
+
+        taus = np.arange(start + 1, end + 1)[:, None]
+        kappas = np.arange(start + 2, end + 2)[None, :]
+
+        A = np.zeros((end - start, end - start))
+        A_coefs = 2 / (kappas - start)
+        A[1:, :] = np.cumsum(V)[:-1, None]
+        A = A_coefs * np.triu(np.cumsum(H, axis=0) - A, k=0)
+
+        B = np.zeros((end - start, end - start))
+        B_num = 2 * (kappas - taus)
+        B_den = (kappas - start) * (taus - start - 1)
+        B_mask = np.triu(np.ones_like(B_den, dtype=bool), k=0)
+        B_out = np.zeros_like(B_den, dtype=float)
+        B_coefs = np.divide(B_num, B_den, out=B_out, where=B_mask & (B_den != 
0))
+        B[1:, 1:] = B_coefs[1:, 1:] * np.cumsum(V)[:-1, None]
+
+        C = np.zeros((end - start, end - start))
+        C_num = 2 * (taus - start)
+        C_den = (kappas - start) * (kappas - taus - 1)
+        C_mask = np.triu(np.ones_like(C_den, dtype=bool), k=1)
+        C_out = np.zeros_like(C_den, dtype=float)
+        C_coefs = np.divide(C_num, C_den, out=C_out, where=C_mask & (C_den != 
0))
+        C[:-1, 1:] = C_coefs[:-1, 1:] * np.flipud(np.cumsum(np.flipud(H[1:, 
1:]), axis=0))
+
+        # Element of matrix `Q_{i, j}` is equal to `Q(τ, κ) = Q(i + 1, j + 2) 
= QQ(sequence[start : i + 1], sequence[i + 1 : j + 2])`.
+        # So, critical point is `τ = i + 1`.
+        return A - B - C
+
+    def get_candidate_change_point(self, interval: slice) -> 
CandidateChangePoint:
+        '''For a given `slice(start, stop)` finds potential critical point in 
subsequence series[slice],
+        i.e., from index `start` to `stop - 1` inclusive. For simplicity, 
we'll use `end = stop - 1`.
+
+
+        The potential critical point is defined as the following.
+            Suppose for given `start` and `end` the pair `(τ, κ)` maximizes 
function
+                            Q(τ, κ) = QQ(series[start : τ], series[τ : κ]),
+            where `τ` and `κ` are such that both `series[start : τ]` and 
`series[τ : κ]` are non-empty.
+            Then `τ` is a potential critical point (i.e., the first element of 
`series[τ : κ]`).
+        Note: for `series[start:τ]` and `series[τ:κ]` to be non-empty, we have
+                            start < τ < κ <= end + 1 = len(series).
+        Note: start < τ => sequence[0] cannot be a critical point.
+
+
+        Function `QQ` is defined as the following:
+            Let `Z` be a series `[Z_{start}, Z_{start + 1}, ..., Z_{end}]`, 
and `X` and `Y`
+            be a split of `Z`, i.e.,
+                            X = [Z_{start}, Z_{start + 1}, ..., Z_a] = [X_1, 
X_2, ..., X_n],
+                            Y = [Z_{a + 1}, Z_{a + 2}, ..., Z_b] = [Y_1, Y_2, 
..., Y_m],
+            then
+                            QQ(X, Y) = 2 / (m + n) * Σi=1,n Σj=1,m |X_i - Y_j|
+                                      - 2 * m / (m + n) / (n - 1) * Σ1<=i<k<=n 
|X_i - X_k|
+                                      - 2 * n / (m + n) / (m - 1) * Σ1<=j<k<=m 
|Y_j - Y_k|.
+
+            Note: QQ(X, Y) = QQ(Z[start : a + 1], Z[a + 1 : b + 1]) = Q(a + 1, 
b + 1).
+
+
+        This method computes all values of `Q` for given `start` and `end`, 
and returns index
+        of the potential critical point and the maximized value of `Q`.'''
+
+        start = 0 if interval.start is None else interval.start
+        end = len(self.series) - 1 if interval.stop is None else interval.stop 
- 1
+        assert start < end, f"start={start} must be less than end={end}"
+
+        Q = self._get_Q_vals(start, end)
+        i, j = np.unravel_index(np.argmax(Q), Q.shape)
+        return CandidateChangePoint(index=i + 1 + start, qhat=Q[i][j])

Review Comment:
   Also this feels reassuringly familiar <3



##########
otava/change_point_divisive/calculator.py:
##########
@@ -0,0 +1,189 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+import numpy as np
+from numpy.typing import NDArray
+
+from otava.change_point_divisive.base import Calculator, CandidateChangePoint
+
+
+class PairDistanceCalculator(Calculator):
+    def __init__(self, series: NDArray, power: int = 1):
+        super().__init__(series)
+        assert 0 < power < 2, f"power={power} isn't in (0, 2)"
+        self.power = power
+        self.V = None
+        self.H = None
+
+    def _calculate_distances(self):
+        '''Precomputes `H` and `V` functions that are used for computation of 
`Q` function.
+            See more details about `Q` function in 
`get_candidate_change_point` method.
+            See more details about the use of `H` and `V` functions in 
`_get_A_B_C` method.
+
+        Let matix `distances` be a matrix of all possible L1 (Manhattan) 
distances between
+        elements of 1d array `series`. If `D(i, j) = |series[i] - series[j]|` 
and `N = len(series)`,
+        then
+
+                      | D(0, 0)        D(0, 1)      …       D(0, N - 1) |
+                      | D(1, 0)        D(1, 1)      …       D(1, N - 1) |
+        distances =   |    ⋮              ⋮          ⋱         ⋮        | .
+                      | D(N - 1, 0)    D(N - 1, 1)  …   D(N - 1, N - 1) |
+
+        Note that the main diagonal of the matrix is filled with 0.
+
+
+        We define functions `V` and `H` as the following (V)ertical and 
(H)orizontal
+        cummulative sums:
+                        V(start, τ, κ) = Σ distances[start:τ, τ],
+                        H(start, τ, κ) = Σ distances[τ, τ:κ],
+        for `start < τ < κ <= N`.
+        Note that `V(start, τ, κ) = V(start, τ)` and `H(start, τ, κ) = H(τ, 
κ)`.
+
+
+        Vector V contains the following values of function `V(start, τ, κ)`:
+                            V = [V(0, 1, κ)     V(0, 2, κ),    …  V(0, N - 1, 
κ)].
+            `V(0, 0, κ) = 0` so they are ommited. As noted, function `V(start, 
τ, κ)`
+            does not depend on `κ`. Therefore, vector V contains all values of 
function
+            `V(0, τ, κ)`. We can easily get values of function `V` for an 
arbitrary
+            value of `start` by
+                            V(start, τ, κ) = V(0, τ, κ) - Σ distances[0 : 
start, τ].
+        Note: The reason not all values of `V(start, τ, κ)` are precomputed is 
that
+              we do not need them all. The values of `start` will depend on 
critical
+              points we find. Precomuting them for all possible values is a 
waste.
+
+
+        Matrix H contains the following values of function `H(start, τ, κ)`:
+
+                  | H(start, 0, 2)    H(start, 0, 3)   …  H(start, 0, N)     |
+                  |          0        H(start, 1, 3)   …  H(start, 1, N)     |
+            H =   |          ⋮                 ⋮        ⋱          ⋮         | 
.
+                  |          0                 0       …  H(start, N - 2, N) |
+
+            `H(start, x, x) = H(start, x - 1, x) = 0` for any `x` so they are 
ommited.
+            As noted, function `H(start, τ, κ)` does not depend on `start`. 
Therefore,
+            matrix H contains all possible values of function `H`.
+        Note: We precomputed all values of `H(start, τ, κ)` because all of 
them are needed
+              for the very first iteration (`start=0` and `end=N-1`).'''
+        self.distances = np.power(np.abs(self.series[:, None] - 
self.series[None, :]), self.power)
+        triu = np.triu(self.distances, k=1)[:-1, 1:]
+        self.V = triu.sum(axis=0)
+        self.H = triu.cumsum(axis=1)
+
+    def _get_Q_vals(self, start: int, end: int) -> NDArray:
+        '''Computes matrices A, B, C where all possible values of function `Q` 
are
+        given by matrix Q = A - B - C.
+        See more details about `Q` function in `get_candidate_change_point` 
method.
+
+        For any given `start` and `end` let
+                        Q(τ, κ) = A(start, τ, κ) - B(start, τ, κ) - C(start, 
τ, κ),
+        where `start < τ < κ <= end + 1`. (For definitions of `A`, `B`, `C` see
+        formula of `Q` in `get_candidate_change_point` method.) All possible 
values of function
+        `A` are given by matrix
+
+              | A(start, start + 1, start + 2)    A(start, start + 1, start + 
3)   …  A(start, start + 1, end + 1) |
+              |                0                  A(start, start + 2, start + 
3)   …  A(start, start + 2, end + 1) |
+        A =   |                ⋮                                  ⋮            
     ⋱                 ⋮            | .
+              |                0                                  0            
    …  A(start, end, end + 1)       |
+
+        Matrices B and C follow the same index-to-value pattern.
+
+        Matrices A, B, and C are used to compute values of function `Q` 
recursively, without
+        recomputing the same sums over and over again. The recursive formulas 
were further
+        transformed to closed forms, so they can be computed using numpy 
cummulative sum
+        function to take advantage of numpy vectorized operations. (Note that 
each column
+        in matrices A, B, C can be repersented through np.cumsum). The 
formulas use
+        commulative sum of functions `H` and `V`, which definitions can be 
found in
+        `_calculate_distances` method.
+
+            A(start, τ, κ) = 2 / (κ - s) * (Σt=start,τ-1  H(start, t, κ) - 
Σt=start+1,τ-1  V(start, t, κ)),
+
+            B(start, τ, κ) = 2 * (κ - τ) / (κ - start) / (τ - start - 1) * 
Σt=start+1,τ-1  V(start, t, κ),
+
+            C(start, τ, κ) = 2 * (τ - start) / (κ - start) / (κ - τ - 1) * 
Σt=τ,κ-2  H(start, t, κ).'''
+        if self.V is None or self.H is None:
+            self._calculate_distances()
+
+        V = self.V[start : end] - self.distances[0 : start, start + 1 : end + 
1].sum(axis=0)
+        H = self.H[start : end, start : end]
+
+        taus = np.arange(start + 1, end + 1)[:, None]
+        kappas = np.arange(start + 2, end + 2)[None, :]
+
+        A = np.zeros((end - start, end - start))
+        A_coefs = 2 / (kappas - start)
+        A[1:, :] = np.cumsum(V)[:-1, None]
+        A = A_coefs * np.triu(np.cumsum(H, axis=0) - A, k=0)
+
+        B = np.zeros((end - start, end - start))
+        B_num = 2 * (kappas - taus)
+        B_den = (kappas - start) * (taus - start - 1)
+        B_mask = np.triu(np.ones_like(B_den, dtype=bool), k=0)
+        B_out = np.zeros_like(B_den, dtype=float)
+        B_coefs = np.divide(B_num, B_den, out=B_out, where=B_mask & (B_den != 
0))
+        B[1:, 1:] = B_coefs[1:, 1:] * np.cumsum(V)[:-1, None]
+
+        C = np.zeros((end - start, end - start))
+        C_num = 2 * (taus - start)
+        C_den = (kappas - start) * (kappas - taus - 1)
+        C_mask = np.triu(np.ones_like(C_den, dtype=bool), k=1)
+        C_out = np.zeros_like(C_den, dtype=float)
+        C_coefs = np.divide(C_num, C_den, out=C_out, where=C_mask & (C_den != 
0))
+        C[:-1, 1:] = C_coefs[:-1, 1:] * np.flipud(np.cumsum(np.flipud(H[1:, 
1:]), axis=0))
+
+        # Element of matrix `Q_{i, j}` is equal to `Q(τ, κ) = Q(i + 1, j + 2) 
= QQ(sequence[start : i + 1], sequence[i + 1 : j + 2])`.
+        # So, critical point is `τ = i + 1`.
+        return A - B - C
+
+    def get_candidate_change_point(self, interval: slice) -> 
CandidateChangePoint:
+        '''For a given `slice(start, stop)` finds potential critical point in 
subsequence series[slice],
+        i.e., from index `start` to `stop - 1` inclusive. For simplicity, 
we'll use `end = stop - 1`.
+
+
+        The potential critical point is defined as the following.
+            Suppose for given `start` and `end` the pair `(τ, κ)` maximizes 
function
+                            Q(τ, κ) = QQ(series[start : τ], series[τ : κ]),
+            where `τ` and `κ` are such that both `series[start : τ]` and 
`series[τ : κ]` are non-empty.
+            Then `τ` is a potential critical point (i.e., the first element of 
`series[τ : κ]`).
+        Note: for `series[start:τ]` and `series[τ:κ]` to be non-empty, we have
+                            start < τ < κ <= end + 1 = len(series).
+        Note: start < τ => sequence[0] cannot be a critical point.
+

Review Comment:
   That's more like it! I'll leave all my other commentary for philosophical 
value. But please review, I think earlier when you end up having +2 here and 
there, you've done a +1 once too many somewhere. (Result can still be correct, 
readabilit suffers.



##########
otava/analysis.py:
##########
@@ -278,35 +266,31 @@ def split(series: np.array, window_len: int = 30, 
max_pvalue: float = 0.001,
     tester = TTestSignificanceTester(max_pvalue)
     while start < len(series):
         end = min(start + window_len, len(series))
-        calculator = cext_calculator
 
-        algo = EDivisive(seed=None, calculator=calculator, 
significance_tester=tester)
-        pts = algo.get_change_points(series[start:end])
-        new_indexes = [p.index + start for p in pts]
-        new_indexes.sort()
-        last_new_change_point_index = next(iter(new_indexes[-1:]), 0)
+        algo = ChangePointDetector(significance_tester=tester, 
calculator=PairDistanceCalculator)
+        new_change_points = algo.get_change_points(series, start, end)
+        last_new_change_point_index = new_change_points[-1].index if 
new_change_points else 0
         start = max(last_new_change_point_index, start + step)
         # incremental Otava can duplicate an old cp
-        for i in new_indexes:
-            if i not in indexes:
-                indexes += [i]
+        for cp in new_change_points:
+            if cp not in change_points:
+                change_points += [cp]
 
-    window_endpoints = [0] + indexes + [len(series)]
-    return [tester.change_point(i, series, window_endpoints) for i in indexes]
+    intervals = tester.get_intervals(change_points)
+    return [tester.change_point(cp.to_candidate(), series, intervals) for cp 
in change_points]
 
 
-def compute_change_points_orig(series: np.array, max_pvalue: float = 0.001) -> 
List[ChangePoint]:
-    calculator = cext_calculator
-    tester = QHatPermutationsSignificanceTester(calculator, pvalue=max_pvalue, 
permutations=100)
-    algo = EDivisive(seed=None, calculator=calculator, 
significance_tester=tester)
-    pts = algo.get_change_points(series)
-    return pts, None
+def compute_change_points_orig(series: Sequence[SupportsFloat], max_pvalue: 
float = 0.001, seed: Optional[int] = None) -> Tuple[PermCPList, 
Optional[PermCPList]]:
+    tester = PermutationsSignificanceTester(alpha=max_pvalue, 
permurations=100, calculator=PairDistanceCalculator, seed=seed)

Review Comment:
   permutations



##########
otava/analysis.py:
##########
@@ -259,13 +247,13 @@ def split(series: np.array, window_len: int = 30, 
max_pvalue: float = 0.001,
     assert "Window length must be at least 2", window_len >= 2
     start = 0
     step = int(window_len / 2)
-    indexes = []
+    change_points = []

Review Comment:
   Perhaps update the above comment, and check other cmments too. (grep mongodb 
and signal)



##########
otava/analysis.py:
##########
@@ -170,10 +126,42 @@ def compare(self, left: np.ndarray, right: np.ndarray) -> 
ComparativeStats:
             )
         else:
             p = 1.0
-        return ComparativeStats(mean_l, mean_r, std_l, std_r, p)
+        return TTestStats(mean_1=mean_l, mean_2=mean_r, std_1=std_l, 
std_2=std_r, pvalue=p)
+
+    def change_point(
+        self, candidate: CandidateChangePoint, series: 
Sequence[SupportsFloat], intervals: List[slice]
+    ) -> ChangePoint[TTestStats]:
+        """
+        Computes properties of the change point if the Candidate Change Point 
based on the provided intervals.
+
+        The method works for two cases:
+        1. Divisive algorithm.
+           if the candidate is a new potential change point, i.e., its index 
is inside any interval, then
+           we split the interval by the candidate's index to get left and 
right subseries.
+        2. Merge step in t-test algorithm.
+           if the candidate is an existing change point, i.e., it matches the 
end of two intervals, then
+           it's a potential weak change point, and we don't need to split the 
intervals anymore (just take
+           both intervals as left and right subseries).
+
+        """
+        for i, interval in enumerate(intervals):
+            if interval.stop == candidate.index:
+                left_interval = interval
+                right_interval = intervals[i + 1]
+                break
+            elif (interval.start is None or interval.start < candidate.index) 
and (interval.stop is None or candidate.index < interval.stop):

Review Comment:
   How can interval start ever be None? If the end points aren't known I'd say 
it's not an interval at all?



##########
otava/analysis.py:
##########
@@ -278,35 +266,31 @@ def split(series: np.array, window_len: int = 30, 
max_pvalue: float = 0.001,
     tester = TTestSignificanceTester(max_pvalue)
     while start < len(series):
         end = min(start + window_len, len(series))
-        calculator = cext_calculator
 
-        algo = EDivisive(seed=None, calculator=calculator, 
significance_tester=tester)
-        pts = algo.get_change_points(series[start:end])
-        new_indexes = [p.index + start for p in pts]
-        new_indexes.sort()
-        last_new_change_point_index = next(iter(new_indexes[-1:]), 0)
+        algo = ChangePointDetector(significance_tester=tester, 
calculator=PairDistanceCalculator)

Review Comment:
   ChangePointDetector is the name of the entire algorithm?
   Ie is the more generic alternative to e-divisive?



##########
otava/change_point_divisive/calculator.py:
##########
@@ -0,0 +1,189 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+import numpy as np
+from numpy.typing import NDArray
+
+from otava.change_point_divisive.base import Calculator, CandidateChangePoint
+
+
+class PairDistanceCalculator(Calculator):
+    def __init__(self, series: NDArray, power: int = 1):
+        super().__init__(series)
+        assert 0 < power < 2, f"power={power} isn't in (0, 2)"

Review Comment:
   That's a strong assertion. Why use this parameter at all?
   
   I think we only ever used 1. But 2 could be a way to de-emphasize small 
changes?



##########
otava/change_point_divisive/base.py:
##########
@@ -0,0 +1,109 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from dataclasses import dataclass, fields
+from typing import Generic, List, Optional, TypeVar
+
+from numpy.typing import NDArray
+
+
+@dataclass
+class CandidateChangePoint:
+    '''Candidate for a change point. The point that maximizes Q-hat function 
on [start:end+1] slice'''
+    index: int
+    qhat: float
+
+
+@dataclass
+class BaseStats:
+    '''Abstract statistics class for change point. Implementation depends on 
the statistical test.'''
+    pvalue: float
+
+
+GenericStats = TypeVar("GenericStats", bound=BaseStats)
+
+
+@dataclass
+class ChangePoint(CandidateChangePoint, Generic[GenericStats]):
+    '''Change point class, defined by index and signigicance test statistic.'''
+    stats: GenericStats
+
+    def __eq__(self, other):
+        '''Helpful to identify new Change Points during divisive algorithm'''
+        return isinstance(other, self.__class__) and self.index == other.index
+
+    @classmethod
+    def from_candidate(cls, candidate: CandidateChangePoint, stats: 
GenericStats) -> 'ChangePoint[GenericStats]':
+        return cls(
+            index=candidate.index,
+            qhat=candidate.qhat,
+            stats=stats,
+        )
+
+    def to_candidate(self) -> CandidateChangePoint:
+        '''Downgrades Change Point to a Candidate Change Point. Used to 
recompute stats for Weak Change Points.'''
+        data = {f.name: getattr(self, f.name) for f in 
fields(CandidateChangePoint)}
+        return CandidateChangePoint(**data)
+
+
+class SignificanceTester(Generic[GenericStats]):
+    '''Abstract class for significance tester'''
+
+    def __init__(self, alpha: float):
+        self.alpha = alpha

Review Comment:
   I know what p-value is but what is alpha?



##########
otava/change_point_divisive/base.py:
##########
@@ -0,0 +1,109 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from dataclasses import dataclass, fields
+from typing import Generic, List, Optional, TypeVar
+
+from numpy.typing import NDArray
+
+
+@dataclass
+class CandidateChangePoint:
+    '''Candidate for a change point. The point that maximizes Q-hat function 
on [start:end+1] slice'''
+    index: int
+    qhat: float
+
+
+@dataclass
+class BaseStats:
+    '''Abstract statistics class for change point. Implementation depends on 
the statistical test.'''
+    pvalue: float

Review Comment:
   I would argue we always want mean on the left and right side and stddev too. 
Even if not used by the significance test, these are typically reported back to 
users, so this is a good opportunity to standardize that they are always here. 
Essentially the change in a change point is mean_right - mean_left, except for 
the interesting case where that is zero yet it's still a change point :-)



##########
otava/change_point_divisive/calculator.py:
##########
@@ -0,0 +1,189 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+import numpy as np
+from numpy.typing import NDArray
+
+from otava.change_point_divisive.base import Calculator, CandidateChangePoint
+
+
+class PairDistanceCalculator(Calculator):
+    def __init__(self, series: NDArray, power: int = 1):
+        super().__init__(series)
+        assert 0 < power < 2, f"power={power} isn't in (0, 2)"
+        self.power = power
+        self.V = None
+        self.H = None
+
+    def _calculate_distances(self):

Review Comment:
   I don't mind this method name, but when explaining to people, I find that 
"pairwise differences" is layman-understandable.



##########
otava/analysis.py:
##########
@@ -259,13 +247,13 @@ def split(series: np.array, window_len: int = 30, 
max_pvalue: float = 0.001,
     assert "Window length must be at least 2", window_len >= 2

Review Comment:
   This assert is the wrong way
   
   >>> assert True, False
   >>> assert False, True
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   AssertionError: True
   >>> 
   



##########
otava/change_point_divisive/calculator.py:
##########
@@ -0,0 +1,189 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+import numpy as np
+from numpy.typing import NDArray
+
+from otava.change_point_divisive.base import Calculator, CandidateChangePoint
+
+
+class PairDistanceCalculator(Calculator):
+    def __init__(self, series: NDArray, power: int = 1):
+        super().__init__(series)
+        assert 0 < power < 2, f"power={power} isn't in (0, 2)"
+        self.power = power
+        self.V = None
+        self.H = None
+
+    def _calculate_distances(self):
+        '''Precomputes `H` and `V` functions that are used for computation of 
`Q` function.
+            See more details about `Q` function in 
`get_candidate_change_point` method.
+            See more details about the use of `H` and `V` functions in 
`_get_A_B_C` method.
+
+        Let matix `distances` be a matrix of all possible L1 (Manhattan) 
distances between
+        elements of 1d array `series`. If `D(i, j) = |series[i] - series[j]|` 
and `N = len(series)`,
+        then
+
+                      | D(0, 0)        D(0, 1)      …       D(0, N - 1) |
+                      | D(1, 0)        D(1, 1)      …       D(1, N - 1) |
+        distances =   |    ⋮              ⋮          ⋱         ⋮        | .
+                      | D(N - 1, 0)    D(N - 1, 1)  …   D(N - 1, N - 1) |
+
+        Note that the main diagonal of the matrix is filled with 0.
+
+
+        We define functions `V` and `H` as the following (V)ertical and 
(H)orizontal
+        cummulative sums:
+                        V(start, τ, κ) = Σ distances[start:τ, τ],
+                        H(start, τ, κ) = Σ distances[τ, τ:κ],
+        for `start < τ < κ <= N`.
+        Note that `V(start, τ, κ) = V(start, τ)` and `H(start, τ, κ) = H(τ, 
κ)`.
+
+
+        Vector V contains the following values of function `V(start, τ, κ)`:
+                            V = [V(0, 1, κ)     V(0, 2, κ),    …  V(0, N - 1, 
κ)].
+            `V(0, 0, κ) = 0` so they are ommited. As noted, function `V(start, 
τ, κ)`
+            does not depend on `κ`. Therefore, vector V contains all values of 
function
+            `V(0, τ, κ)`. We can easily get values of function `V` for an 
arbitrary
+            value of `start` by
+                            V(start, τ, κ) = V(0, τ, κ) - Σ distances[0 : 
start, τ].
+        Note: The reason not all values of `V(start, τ, κ)` are precomputed is 
that
+              we do not need them all. The values of `start` will depend on 
critical
+              points we find. Precomuting them for all possible values is a 
waste.
+
+
+        Matrix H contains the following values of function `H(start, τ, κ)`:
+
+                  | H(start, 0, 2)    H(start, 0, 3)   …  H(start, 0, N)     |
+                  |          0        H(start, 1, 3)   …  H(start, 1, N)     |
+            H =   |          ⋮                 ⋮        ⋱          ⋮         | 
.
+                  |          0                 0       …  H(start, N - 2, N) |
+
+            `H(start, x, x) = H(start, x - 1, x) = 0` for any `x` so they are 
ommited.
+            As noted, function `H(start, τ, κ)` does not depend on `start`. 
Therefore,
+            matrix H contains all possible values of function `H`.
+        Note: We precomputed all values of `H(start, τ, κ)` because all of 
them are needed
+              for the very first iteration (`start=0` and `end=N-1`).'''
+        self.distances = np.power(np.abs(self.series[:, None] - 
self.series[None, :]), self.power)
+        triu = np.triu(self.distances, k=1)[:-1, 1:]
+        self.V = triu.sum(axis=0)
+        self.H = triu.cumsum(axis=1)
+
+    def _get_Q_vals(self, start: int, end: int) -> NDArray:
+        '''Computes matrices A, B, C where all possible values of function `Q` 
are
+        given by matrix Q = A - B - C.
+        See more details about `Q` function in `get_candidate_change_point` 
method.
+
+        For any given `start` and `end` let
+                        Q(τ, κ) = A(start, τ, κ) - B(start, τ, κ) - C(start, 
τ, κ),
+        where `start < τ < κ <= end + 1`. (For definitions of `A`, `B`, `C` see
+        formula of `Q` in `get_candidate_change_point` method.) All possible 
values of function

Review Comment:
   If I follow correctly, the +1 in above end+1 is really just because that's 
the custom in python arrays, ranges and for loops. (Ie array indexes are from 0 
to length-1).
   
   My preferense would be that these comments stick to the coordinate system:
    - series has Length measurements, indexed from 0 to Length-1
    - Change points share the same indexes 0 to Length-1, except that 0 is not 
possible.
    - If it's possible, I would even stick to that numbering also in all 
methods we define.
    - Then only inside methods do you have to do things like 
numpy.whatever(start, end+1)
   
   Does something go terribly wrong if we try the above? I believe then the 
inequality above has to be
   
       start < τ <= κ <= end
   
   Does this violently contradict notation in Matteson paper?



##########
otava/change_point_divisive/calculator.py:
##########
@@ -0,0 +1,189 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+import numpy as np
+from numpy.typing import NDArray
+
+from otava.change_point_divisive.base import Calculator, CandidateChangePoint
+
+
+class PairDistanceCalculator(Calculator):
+    def __init__(self, series: NDArray, power: int = 1):
+        super().__init__(series)
+        assert 0 < power < 2, f"power={power} isn't in (0, 2)"
+        self.power = power
+        self.V = None
+        self.H = None
+
+    def _calculate_distances(self):
+        '''Precomputes `H` and `V` functions that are used for computation of 
`Q` function.
+            See more details about `Q` function in 
`get_candidate_change_point` method.
+            See more details about the use of `H` and `V` functions in 
`_get_A_B_C` method.
+
+        Let matix `distances` be a matrix of all possible L1 (Manhattan) 
distances between
+        elements of 1d array `series`. If `D(i, j) = |series[i] - series[j]|` 
and `N = len(series)`,
+        then
+
+                      | D(0, 0)        D(0, 1)      …       D(0, N - 1) |
+                      | D(1, 0)        D(1, 1)      …       D(1, N - 1) |
+        distances =   |    ⋮              ⋮          ⋱         ⋮        | .
+                      | D(N - 1, 0)    D(N - 1, 1)  …   D(N - 1, N - 1) |
+
+        Note that the main diagonal of the matrix is filled with 0.
+
+
+        We define functions `V` and `H` as the following (V)ertical and 
(H)orizontal
+        cummulative sums:
+                        V(start, τ, κ) = Σ distances[start:τ, τ],
+                        H(start, τ, κ) = Σ distances[τ, τ:κ],
+        for `start < τ < κ <= N`.
+        Note that `V(start, τ, κ) = V(start, τ)` and `H(start, τ, κ) = H(τ, 
κ)`.
+
+
+        Vector V contains the following values of function `V(start, τ, κ)`:
+                            V = [V(0, 1, κ)     V(0, 2, κ),    …  V(0, N - 1, 
κ)].
+            `V(0, 0, κ) = 0` so they are ommited. As noted, function `V(start, 
τ, κ)`
+            does not depend on `κ`. Therefore, vector V contains all values of 
function
+            `V(0, τ, κ)`. We can easily get values of function `V` for an 
arbitrary
+            value of `start` by
+                            V(start, τ, κ) = V(0, τ, κ) - Σ distances[0 : 
start, τ].
+        Note: The reason not all values of `V(start, τ, κ)` are precomputed is 
that
+              we do not need them all. The values of `start` will depend on 
critical
+              points we find. Precomuting them for all possible values is a 
waste.
+
+
+        Matrix H contains the following values of function `H(start, τ, κ)`:
+
+                  | H(start, 0, 2)    H(start, 0, 3)   …  H(start, 0, N)     |
+                  |          0        H(start, 1, 3)   …  H(start, 1, N)     |
+            H =   |          ⋮                 ⋮        ⋱          ⋮         | 
.
+                  |          0                 0       …  H(start, N - 2, N) |
+
+            `H(start, x, x) = H(start, x - 1, x) = 0` for any `x` so they are 
ommited.
+            As noted, function `H(start, τ, κ)` does not depend on `start`. 
Therefore,
+            matrix H contains all possible values of function `H`.
+        Note: We precomputed all values of `H(start, τ, κ)` because all of 
them are needed
+              for the very first iteration (`start=0` and `end=N-1`).'''
+        self.distances = np.power(np.abs(self.series[:, None] - 
self.series[None, :]), self.power)
+        triu = np.triu(self.distances, k=1)[:-1, 1:]
+        self.V = triu.sum(axis=0)
+        self.H = triu.cumsum(axis=1)
+
+    def _get_Q_vals(self, start: int, end: int) -> NDArray:
+        '''Computes matrices A, B, C where all possible values of function `Q` 
are
+        given by matrix Q = A - B - C.
+        See more details about `Q` function in `get_candidate_change_point` 
method.
+
+        For any given `start` and `end` let
+                        Q(τ, κ) = A(start, τ, κ) - B(start, τ, κ) - C(start, 
τ, κ),
+        where `start < τ < κ <= end + 1`. (For definitions of `A`, `B`, `C` see
+        formula of `Q` in `get_candidate_change_point` method.) All possible 
values of function
+        `A` are given by matrix
+
+              | A(start, start + 1, start + 2)    A(start, start + 1, start + 
3)   …  A(start, start + 1, end + 1) |
+              |                0                  A(start, start + 2, start + 
3)   …  A(start, start + 2, end + 1) |
+        A =   |                ⋮                                  ⋮            
     ⋱                 ⋮            | .
+              |                0                                  0            
    …  A(start, end, end + 1)       |
+
+        Matrices B and C follow the same index-to-value pattern.
+
+        Matrices A, B, and C are used to compute values of function `Q` 
recursively, without
+        recomputing the same sums over and over again. The recursive formulas 
were further
+        transformed to closed forms, so they can be computed using numpy 
cummulative sum
+        function to take advantage of numpy vectorized operations. (Note that 
each column
+        in matrices A, B, C can be repersented through np.cumsum). The 
formulas use
+        commulative sum of functions `H` and `V`, which definitions can be 
found in
+        `_calculate_distances` method.
+
+            A(start, τ, κ) = 2 / (κ - s) * (Σt=start,τ-1  H(start, t, κ) - 
Σt=start+1,τ-1  V(start, t, κ)),
+
+            B(start, τ, κ) = 2 * (κ - τ) / (κ - start) / (τ - start - 1) * 
Σt=start+1,τ-1  V(start, t, κ),
+
+            C(start, τ, κ) = 2 * (τ - start) / (κ - start) / (κ - τ - 1) * 
Σt=τ,κ-2  H(start, t, κ).'''
+        if self.V is None or self.H is None:
+            self._calculate_distances()
+
+        V = self.V[start : end] - self.distances[0 : start, start + 1 : end + 
1].sum(axis=0)
+        H = self.H[start : end, start : end]
+
+        taus = np.arange(start + 1, end + 1)[:, None]
+        kappas = np.arange(start + 2, end + 2)[None, :]
+
+        A = np.zeros((end - start, end - start))
+        A_coefs = 2 / (kappas - start)
+        A[1:, :] = np.cumsum(V)[:-1, None]
+        A = A_coefs * np.triu(np.cumsum(H, axis=0) - A, k=0)
+
+        B = np.zeros((end - start, end - start))
+        B_num = 2 * (kappas - taus)
+        B_den = (kappas - start) * (taus - start - 1)
+        B_mask = np.triu(np.ones_like(B_den, dtype=bool), k=0)
+        B_out = np.zeros_like(B_den, dtype=float)
+        B_coefs = np.divide(B_num, B_den, out=B_out, where=B_mask & (B_den != 
0))
+        B[1:, 1:] = B_coefs[1:, 1:] * np.cumsum(V)[:-1, None]
+
+        C = np.zeros((end - start, end - start))
+        C_num = 2 * (taus - start)
+        C_den = (kappas - start) * (kappas - taus - 1)
+        C_mask = np.triu(np.ones_like(C_den, dtype=bool), k=1)
+        C_out = np.zeros_like(C_den, dtype=float)
+        C_coefs = np.divide(C_num, C_den, out=C_out, where=C_mask & (C_den != 
0))
+        C[:-1, 1:] = C_coefs[:-1, 1:] * np.flipud(np.cumsum(np.flipud(H[1:, 
1:]), axis=0))
+
+        # Element of matrix `Q_{i, j}` is equal to `Q(τ, κ) = Q(i + 1, j + 2) 
= QQ(sequence[start : i + 1], sequence[i + 1 : j + 2])`.
+        # So, critical point is `τ = i + 1`.
+        return A - B - C
+
+    def get_candidate_change_point(self, interval: slice) -> 
CandidateChangePoint:
+        '''For a given `slice(start, stop)` finds potential critical point in 
subsequence series[slice],
+        i.e., from index `start` to `stop - 1` inclusive. For simplicity, 
we'll use `end = stop - 1`.

Review Comment:
   To be super clear, maybe use "interval" and ,) if you are talking math, and 
slice with [] and range with () in python context



##########
otava/analysis.py:
##########
@@ -170,10 +126,42 @@ def compare(self, left: np.ndarray, right: np.ndarray) -> 
ComparativeStats:
             )
         else:
             p = 1.0
-        return ComparativeStats(mean_l, mean_r, std_l, std_r, p)
+        return TTestStats(mean_1=mean_l, mean_2=mean_r, std_1=std_l, 
std_2=std_r, pvalue=p)
+
+    def change_point(
+        self, candidate: CandidateChangePoint, series: 
Sequence[SupportsFloat], intervals: List[slice]
+    ) -> ChangePoint[TTestStats]:
+        """
+        Computes properties of the change point if the Candidate Change Point 
based on the provided intervals.
+
+        The method works for two cases:
+        1. Divisive algorithm.
+           if the candidate is a new potential change point, i.e., its index 
is inside any interval, then
+           we split the interval by the candidate's index to get left and 
right subseries.
+        2. Merge step in t-test algorithm.

Review Comment:
   Oh, maybe we haven't been introduced properly. I'm the one who is serious 
about naming things. David is the one who knows all about cache invalidation.



##########
otava/change_point_divisive/calculator.py:
##########
@@ -0,0 +1,189 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+import numpy as np
+from numpy.typing import NDArray
+
+from otava.change_point_divisive.base import Calculator, CandidateChangePoint
+
+
+class PairDistanceCalculator(Calculator):
+    def __init__(self, series: NDArray, power: int = 1):
+        super().__init__(series)
+        assert 0 < power < 2, f"power={power} isn't in (0, 2)"
+        self.power = power
+        self.V = None
+        self.H = None
+
+    def _calculate_distances(self):
+        '''Precomputes `H` and `V` functions that are used for computation of 
`Q` function.
+            See more details about `Q` function in 
`get_candidate_change_point` method.
+            See more details about the use of `H` and `V` functions in 
`_get_A_B_C` method.
+
+        Let matix `distances` be a matrix of all possible L1 (Manhattan) 
distances between
+        elements of 1d array `series`. If `D(i, j) = |series[i] - series[j]|` 
and `N = len(series)`,
+        then
+
+                      | D(0, 0)        D(0, 1)      …       D(0, N - 1) |
+                      | D(1, 0)        D(1, 1)      …       D(1, N - 1) |
+        distances =   |    ⋮              ⋮          ⋱         ⋮        | .
+                      | D(N - 1, 0)    D(N - 1, 1)  …   D(N - 1, N - 1) |
+
+        Note that the main diagonal of the matrix is filled with 0.
+
+
+        We define functions `V` and `H` as the following (V)ertical and 
(H)orizontal
+        cummulative sums:
+                        V(start, τ, κ) = Σ distances[start:τ, τ],
+                        H(start, τ, κ) = Σ distances[τ, τ:κ],
+        for `start < τ < κ <= N`.
+        Note that `V(start, τ, κ) = V(start, τ)` and `H(start, τ, κ) = H(τ, 
κ)`.
+
+
+        Vector V contains the following values of function `V(start, τ, κ)`:
+                            V = [V(0, 1, κ)     V(0, 2, κ),    …  V(0, N - 1, 
κ)].
+            `V(0, 0, κ) = 0` so they are ommited. As noted, function `V(start, 
τ, κ)`
+            does not depend on `κ`. Therefore, vector V contains all values of 
function
+            `V(0, τ, κ)`. We can easily get values of function `V` for an 
arbitrary
+            value of `start` by
+                            V(start, τ, κ) = V(0, τ, κ) - Σ distances[0 : 
start, τ].
+        Note: The reason not all values of `V(start, τ, κ)` are precomputed is 
that
+              we do not need them all. The values of `start` will depend on 
critical
+              points we find. Precomuting them for all possible values is a 
waste.
+
+
+        Matrix H contains the following values of function `H(start, τ, κ)`:
+
+                  | H(start, 0, 2)    H(start, 0, 3)   …  H(start, 0, N)     |
+                  |          0        H(start, 1, 3)   …  H(start, 1, N)     |
+            H =   |          ⋮                 ⋮        ⋱          ⋮         | 
.
+                  |          0                 0       …  H(start, N - 2, N) |
+
+            `H(start, x, x) = H(start, x - 1, x) = 0` for any `x` so they are 
ommited.
+            As noted, function `H(start, τ, κ)` does not depend on `start`. 
Therefore,
+            matrix H contains all possible values of function `H`.
+        Note: We precomputed all values of `H(start, τ, κ)` because all of 
them are needed
+              for the very first iteration (`start=0` and `end=N-1`).'''
+        self.distances = np.power(np.abs(self.series[:, None] - 
self.series[None, :]), self.power)
+        triu = np.triu(self.distances, k=1)[:-1, 1:]
+        self.V = triu.sum(axis=0)
+        self.H = triu.cumsum(axis=1)
+
+    def _get_Q_vals(self, start: int, end: int) -> NDArray:
+        '''Computes matrices A, B, C where all possible values of function `Q` 
are
+        given by matrix Q = A - B - C.
+        See more details about `Q` function in `get_candidate_change_point` 
method.
+
+        For any given `start` and `end` let
+                        Q(τ, κ) = A(start, τ, κ) - B(start, τ, κ) - C(start, 
τ, κ),
+        where `start < τ < κ <= end + 1`. (For definitions of `A`, `B`, `C` see
+        formula of `Q` in `get_candidate_change_point` method.) All possible 
values of function
+        `A` are given by matrix
+
+              | A(start, start + 1, start + 2)    A(start, start + 1, start + 
3)   …  A(start, start + 1, end + 1) |
+              |                0                  A(start, start + 2, start + 
3)   …  A(start, start + 2, end + 1) |
+        A =   |                ⋮                                  ⋮            
     ⋱                 ⋮            | .
+              |                0                                  0            
    …  A(start, end, end + 1)       |
+
+        Matrices B and C follow the same index-to-value pattern.
+
+        Matrices A, B, and C are used to compute values of function `Q` 
recursively, without
+        recomputing the same sums over and over again. The recursive formulas 
were further
+        transformed to closed forms, so they can be computed using numpy 
cummulative sum
+        function to take advantage of numpy vectorized operations. (Note that 
each column
+        in matrices A, B, C can be repersented through np.cumsum). The 
formulas use
+        commulative sum of functions `H` and `V`, which definitions can be 
found in
+        `_calculate_distances` method.
+
+            A(start, τ, κ) = 2 / (κ - s) * (Σt=start,τ-1  H(start, t, κ) - 
Σt=start+1,τ-1  V(start, t, κ)),
+
+            B(start, τ, κ) = 2 * (κ - τ) / (κ - start) / (τ - start - 1) * 
Σt=start+1,τ-1  V(start, t, κ),
+
+            C(start, τ, κ) = 2 * (τ - start) / (κ - start) / (κ - τ - 1) * 
Σt=τ,κ-2  H(start, t, κ).'''
+        if self.V is None or self.H is None:
+            self._calculate_distances()
+
+        V = self.V[start : end] - self.distances[0 : start, start + 1 : end + 
1].sum(axis=0)
+        H = self.H[start : end, start : end]
+
+        taus = np.arange(start + 1, end + 1)[:, None]
+        kappas = np.arange(start + 2, end + 2)[None, :]
+
+        A = np.zeros((end - start, end - start))
+        A_coefs = 2 / (kappas - start)
+        A[1:, :] = np.cumsum(V)[:-1, None]
+        A = A_coefs * np.triu(np.cumsum(H, axis=0) - A, k=0)
+
+        B = np.zeros((end - start, end - start))
+        B_num = 2 * (kappas - taus)
+        B_den = (kappas - start) * (taus - start - 1)
+        B_mask = np.triu(np.ones_like(B_den, dtype=bool), k=0)
+        B_out = np.zeros_like(B_den, dtype=float)
+        B_coefs = np.divide(B_num, B_den, out=B_out, where=B_mask & (B_den != 
0))
+        B[1:, 1:] = B_coefs[1:, 1:] * np.cumsum(V)[:-1, None]
+
+        C = np.zeros((end - start, end - start))
+        C_num = 2 * (taus - start)
+        C_den = (kappas - start) * (kappas - taus - 1)
+        C_mask = np.triu(np.ones_like(C_den, dtype=bool), k=1)
+        C_out = np.zeros_like(C_den, dtype=float)
+        C_coefs = np.divide(C_num, C_den, out=C_out, where=C_mask & (C_den != 
0))
+        C[:-1, 1:] = C_coefs[:-1, 1:] * np.flipud(np.cumsum(np.flipud(H[1:, 
1:]), axis=0))
+
+        # Element of matrix `Q_{i, j}` is equal to `Q(τ, κ) = Q(i + 1, j + 2) 
= QQ(sequence[start : i + 1], sequence[i + 1 : j + 2])`.
+        # So, critical point is `τ = i + 1`.

Review Comment:
   The fact that you end up with parameters i+1, j+2 here, rather than the more 
common i, j+1, suggests to me you need to shift your indexing to the left one 
step. (0,T) or )0,T) not (1,T+1)



##########
otava/change_point_divisive/calculator.py:
##########
@@ -0,0 +1,189 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+import numpy as np
+from numpy.typing import NDArray
+
+from otava.change_point_divisive.base import Calculator, CandidateChangePoint
+
+
+class PairDistanceCalculator(Calculator):
+    def __init__(self, series: NDArray, power: int = 1):
+        super().__init__(series)
+        assert 0 < power < 2, f"power={power} isn't in (0, 2)"
+        self.power = power
+        self.V = None
+        self.H = None
+
+    def _calculate_distances(self):
+        '''Precomputes `H` and `V` functions that are used for computation of 
`Q` function.
+            See more details about `Q` function in 
`get_candidate_change_point` method.
+            See more details about the use of `H` and `V` functions in 
`_get_A_B_C` method.
+
+        Let matix `distances` be a matrix of all possible L1 (Manhattan) 
distances between
+        elements of 1d array `series`. If `D(i, j) = |series[i] - series[j]|` 
and `N = len(series)`,
+        then
+
+                      | D(0, 0)        D(0, 1)      …       D(0, N - 1) |
+                      | D(1, 0)        D(1, 1)      …       D(1, N - 1) |
+        distances =   |    ⋮              ⋮          ⋱         ⋮        | .
+                      | D(N - 1, 0)    D(N - 1, 1)  …   D(N - 1, N - 1) |
+
+        Note that the main diagonal of the matrix is filled with 0.
+
+
+        We define functions `V` and `H` as the following (V)ertical and 
(H)orizontal
+        cummulative sums:
+                        V(start, τ, κ) = Σ distances[start:τ, τ],
+                        H(start, τ, κ) = Σ distances[τ, τ:κ],
+        for `start < τ < κ <= N`.
+        Note that `V(start, τ, κ) = V(start, τ)` and `H(start, τ, κ) = H(τ, 
κ)`.
+
+
+        Vector V contains the following values of function `V(start, τ, κ)`:
+                            V = [V(0, 1, κ)     V(0, 2, κ),    …  V(0, N - 1, 
κ)].
+            `V(0, 0, κ) = 0` so they are ommited. As noted, function `V(start, 
τ, κ)`
+            does not depend on `κ`. Therefore, vector V contains all values of 
function
+            `V(0, τ, κ)`. We can easily get values of function `V` for an 
arbitrary
+            value of `start` by
+                            V(start, τ, κ) = V(0, τ, κ) - Σ distances[0 : 
start, τ].
+        Note: The reason not all values of `V(start, τ, κ)` are precomputed is 
that
+              we do not need them all. The values of `start` will depend on 
critical
+              points we find. Precomuting them for all possible values is a 
waste.
+
+
+        Matrix H contains the following values of function `H(start, τ, κ)`:
+
+                  | H(start, 0, 2)    H(start, 0, 3)   …  H(start, 0, N)     |
+                  |          0        H(start, 1, 3)   …  H(start, 1, N)     |
+            H =   |          ⋮                 ⋮        ⋱          ⋮         | 
.
+                  |          0                 0       …  H(start, N - 2, N) |
+
+            `H(start, x, x) = H(start, x - 1, x) = 0` for any `x` so they are 
ommited.
+            As noted, function `H(start, τ, κ)` does not depend on `start`. 
Therefore,
+            matrix H contains all possible values of function `H`.
+        Note: We precomputed all values of `H(start, τ, κ)` because all of 
them are needed
+              for the very first iteration (`start=0` and `end=N-1`).'''
+        self.distances = np.power(np.abs(self.series[:, None] - 
self.series[None, :]), self.power)
+        triu = np.triu(self.distances, k=1)[:-1, 1:]
+        self.V = triu.sum(axis=0)
+        self.H = triu.cumsum(axis=1)
+
+    def _get_Q_vals(self, start: int, end: int) -> NDArray:
+        '''Computes matrices A, B, C where all possible values of function `Q` 
are
+        given by matrix Q = A - B - C.
+        See more details about `Q` function in `get_candidate_change_point` 
method.
+
+        For any given `start` and `end` let
+                        Q(τ, κ) = A(start, τ, κ) - B(start, τ, κ) - C(start, 
τ, κ),
+        where `start < τ < κ <= end + 1`. (For definitions of `A`, `B`, `C` see
+        formula of `Q` in `get_candidate_change_point` method.) All possible 
values of function
+        `A` are given by matrix
+
+              | A(start, start + 1, start + 2)    A(start, start + 1, start + 
3)   …  A(start, start + 1, end + 1) |
+              |                0                  A(start, start + 2, start + 
3)   …  A(start, start + 2, end + 1) |
+        A =   |                ⋮                                  ⋮            
     ⋱                 ⋮            | .
+              |                0                                  0            
    …  A(start, end, end + 1)       |
+
+        Matrices B and C follow the same index-to-value pattern.
+
+        Matrices A, B, and C are used to compute values of function `Q` 
recursively, without
+        recomputing the same sums over and over again. The recursive formulas 
were further
+        transformed to closed forms, so they can be computed using numpy 
cummulative sum
+        function to take advantage of numpy vectorized operations. (Note that 
each column
+        in matrices A, B, C can be repersented through np.cumsum). The 
formulas use
+        commulative sum of functions `H` and `V`, which definitions can be 
found in
+        `_calculate_distances` method.
+
+            A(start, τ, κ) = 2 / (κ - s) * (Σt=start,τ-1  H(start, t, κ) - 
Σt=start+1,τ-1  V(start, t, κ)),
+
+            B(start, τ, κ) = 2 * (κ - τ) / (κ - start) / (τ - start - 1) * 
Σt=start+1,τ-1  V(start, t, κ),
+
+            C(start, τ, κ) = 2 * (τ - start) / (κ - start) / (κ - τ - 1) * 
Σt=τ,κ-2  H(start, t, κ).'''
+        if self.V is None or self.H is None:
+            self._calculate_distances()
+
+        V = self.V[start : end] - self.distances[0 : start, start + 1 : end + 
1].sum(axis=0)
+        H = self.H[start : end, start : end]
+
+        taus = np.arange(start + 1, end + 1)[:, None]
+        kappas = np.arange(start + 2, end + 2)[None, :]
+
+        A = np.zeros((end - start, end - start))
+        A_coefs = 2 / (kappas - start)

Review Comment:
   Just for discussion... but it is my opinion that these coefficient are added 
to the formula only to pass some robustness "almost certainly at the limit" 
proof. They are never really explained or justified the way every other term 
is. They have the effect of muting the q values at the ends of a  interval, and 
significantly inflating those in the midldle. As a result, given many good 
candidates q, the algorithm tends to pick change points first in the middle of 
a series.
   
   Anyway, it's for the future, but to me those are free game once we want to 
make any mods to this implementation.



##########
otava/change_point_divisive/calculator.py:
##########
@@ -0,0 +1,189 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+import numpy as np
+from numpy.typing import NDArray
+
+from otava.change_point_divisive.base import Calculator, CandidateChangePoint
+
+
+class PairDistanceCalculator(Calculator):
+    def __init__(self, series: NDArray, power: int = 1):
+        super().__init__(series)
+        assert 0 < power < 2, f"power={power} isn't in (0, 2)"
+        self.power = power
+        self.V = None
+        self.H = None
+
+    def _calculate_distances(self):
+        '''Precomputes `H` and `V` functions that are used for computation of 
`Q` function.
+            See more details about `Q` function in 
`get_candidate_change_point` method.
+            See more details about the use of `H` and `V` functions in 
`_get_A_B_C` method.
+
+        Let matix `distances` be a matrix of all possible L1 (Manhattan) 
distances between
+        elements of 1d array `series`. If `D(i, j) = |series[i] - series[j]|` 
and `N = len(series)`,
+        then
+
+                      | D(0, 0)        D(0, 1)      …       D(0, N - 1) |
+                      | D(1, 0)        D(1, 1)      …       D(1, N - 1) |
+        distances =   |    ⋮              ⋮          ⋱         ⋮        | .
+                      | D(N - 1, 0)    D(N - 1, 1)  …   D(N - 1, N - 1) |
+
+        Note that the main diagonal of the matrix is filled with 0.
+
+
+        We define functions `V` and `H` as the following (V)ertical and 
(H)orizontal
+        cummulative sums:
+                        V(start, τ, κ) = Σ distances[start:τ, τ],
+                        H(start, τ, κ) = Σ distances[τ, τ:κ],
+        for `start < τ < κ <= N`.
+        Note that `V(start, τ, κ) = V(start, τ)` and `H(start, τ, κ) = H(τ, 
κ)`.
+
+
+        Vector V contains the following values of function `V(start, τ, κ)`:
+                            V = [V(0, 1, κ)     V(0, 2, κ),    …  V(0, N - 1, 
κ)].
+            `V(0, 0, κ) = 0` so they are ommited. As noted, function `V(start, 
τ, κ)`
+            does not depend on `κ`. Therefore, vector V contains all values of 
function
+            `V(0, τ, κ)`. We can easily get values of function `V` for an 
arbitrary
+            value of `start` by
+                            V(start, τ, κ) = V(0, τ, κ) - Σ distances[0 : 
start, τ].
+        Note: The reason not all values of `V(start, τ, κ)` are precomputed is 
that
+              we do not need them all. The values of `start` will depend on 
critical
+              points we find. Precomuting them for all possible values is a 
waste.
+
+
+        Matrix H contains the following values of function `H(start, τ, κ)`:
+
+                  | H(start, 0, 2)    H(start, 0, 3)   …  H(start, 0, N)     |
+                  |          0        H(start, 1, 3)   …  H(start, 1, N)     |
+            H =   |          ⋮                 ⋮        ⋱          ⋮         | 
.
+                  |          0                 0       …  H(start, N - 2, N) |
+
+            `H(start, x, x) = H(start, x - 1, x) = 0` for any `x` so they are 
ommited.
+            As noted, function `H(start, τ, κ)` does not depend on `start`. 
Therefore,
+            matrix H contains all possible values of function `H`.
+        Note: We precomputed all values of `H(start, τ, κ)` because all of 
them are needed
+              for the very first iteration (`start=0` and `end=N-1`).'''
+        self.distances = np.power(np.abs(self.series[:, None] - 
self.series[None, :]), self.power)
+        triu = np.triu(self.distances, k=1)[:-1, 1:]
+        self.V = triu.sum(axis=0)
+        self.H = triu.cumsum(axis=1)
+
+    def _get_Q_vals(self, start: int, end: int) -> NDArray:
+        '''Computes matrices A, B, C where all possible values of function `Q` 
are
+        given by matrix Q = A - B - C.
+        See more details about `Q` function in `get_candidate_change_point` 
method.
+
+        For any given `start` and `end` let
+                        Q(τ, κ) = A(start, τ, κ) - B(start, τ, κ) - C(start, 
τ, κ),
+        where `start < τ < κ <= end + 1`. (For definitions of `A`, `B`, `C` see
+        formula of `Q` in `get_candidate_change_point` method.) All possible 
values of function
+        `A` are given by matrix
+
+              | A(start, start + 1, start + 2)    A(start, start + 1, start + 
3)   …  A(start, start + 1, end + 1) |
+              |                0                  A(start, start + 2, start + 
3)   …  A(start, start + 2, end + 1) |
+        A =   |                ⋮                                  ⋮            
     ⋱                 ⋮            | .
+              |                0                                  0            
    …  A(start, end, end + 1)       |
+
+        Matrices B and C follow the same index-to-value pattern.
+
+        Matrices A, B, and C are used to compute values of function `Q` 
recursively, without
+        recomputing the same sums over and over again. The recursive formulas 
were further
+        transformed to closed forms, so they can be computed using numpy 
cummulative sum
+        function to take advantage of numpy vectorized operations. (Note that 
each column
+        in matrices A, B, C can be repersented through np.cumsum). The 
formulas use
+        commulative sum of functions `H` and `V`, which definitions can be 
found in
+        `_calculate_distances` method.
+
+            A(start, τ, κ) = 2 / (κ - s) * (Σt=start,τ-1  H(start, t, κ) - 
Σt=start+1,τ-1  V(start, t, κ)),
+
+            B(start, τ, κ) = 2 * (κ - τ) / (κ - start) / (τ - start - 1) * 
Σt=start+1,τ-1  V(start, t, κ),
+
+            C(start, τ, κ) = 2 * (τ - start) / (κ - start) / (κ - τ - 1) * 
Σt=τ,κ-2  H(start, t, κ).'''
+        if self.V is None or self.H is None:
+            self._calculate_distances()
+
+        V = self.V[start : end] - self.distances[0 : start, start + 1 : end + 
1].sum(axis=0)
+        H = self.H[start : end, start : end]
+
+        taus = np.arange(start + 1, end + 1)[:, None]
+        kappas = np.arange(start + 2, end + 2)[None, :]
+
+        A = np.zeros((end - start, end - start))
+        A_coefs = 2 / (kappas - start)
+        A[1:, :] = np.cumsum(V)[:-1, None]
+        A = A_coefs * np.triu(np.cumsum(H, axis=0) - A, k=0)
+
+        B = np.zeros((end - start, end - start))
+        B_num = 2 * (kappas - taus)
+        B_den = (kappas - start) * (taus - start - 1)
+        B_mask = np.triu(np.ones_like(B_den, dtype=bool), k=0)
+        B_out = np.zeros_like(B_den, dtype=float)
+        B_coefs = np.divide(B_num, B_den, out=B_out, where=B_mask & (B_den != 
0))
+        B[1:, 1:] = B_coefs[1:, 1:] * np.cumsum(V)[:-1, None]
+
+        C = np.zeros((end - start, end - start))
+        C_num = 2 * (taus - start)
+        C_den = (kappas - start) * (kappas - taus - 1)
+        C_mask = np.triu(np.ones_like(C_den, dtype=bool), k=1)
+        C_out = np.zeros_like(C_den, dtype=float)
+        C_coefs = np.divide(C_num, C_den, out=C_out, where=C_mask & (C_den != 
0))
+        C[:-1, 1:] = C_coefs[:-1, 1:] * np.flipud(np.cumsum(np.flipud(H[1:, 
1:]), axis=0))
+
+        # Element of matrix `Q_{i, j}` is equal to `Q(τ, κ) = Q(i + 1, j + 2) 
= QQ(sequence[start : i + 1], sequence[i + 1 : j + 2])`.
+        # So, critical point is `τ = i + 1`.

Review Comment:
   If you don't have time to do that in the near future, the shifting of 
indexes can be a separat followup task too.



##########
otava/change_point_divisive/base.py:
##########
@@ -0,0 +1,109 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from dataclasses import dataclass, fields
+from typing import Generic, List, Optional, TypeVar
+
+from numpy.typing import NDArray
+
+
+@dataclass
+class CandidateChangePoint:
+    '''Candidate for a change point. The point that maximizes Q-hat function 
on [start:end+1] slice'''
+    index: int
+    qhat: float
+
+
+@dataclass
+class BaseStats:
+    '''Abstract statistics class for change point. Implementation depends on 
the statistical test.'''
+    pvalue: float
+
+
+GenericStats = TypeVar("GenericStats", bound=BaseStats)

Review Comment:
   For those  of us not so good at python, can you add a short comment what 
this does



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Replacing signal_processing_algorithms with internal implementation [otava]

Reply via email to