Sowiks commented on code in PR #126:
URL: https://github.com/apache/otava/pull/126#discussion_r2839101019


##########
docs/MATH.md:
##########
@@ -0,0 +1,79 @@
+# Change Point Detection
+## Overview
+Otava implements a nonparametric change point detection algorithm designed to 
identify statistically significant distribution changes in time-ordered data. 
The method is primarily based on the **E-Divisive family of algorithms** for 
multivariate change point detection, with some practical adaptations.
+
+At a high level, the algorithm:
+- Measures statistical divergence between segments of a time series
+- Searches for change points using hierarchical segmentation
+- Evaluates significance of candidate splits using statistical hypothesis 
testing
+
+The current implementation prioritizes:
+- Robustness to noisy real-world signals
+- Deterministic behavior
+- Practical runtime for production workloads
+
+A representative example of algorithm application:
+
+![Example](./imgs/example.png "Example")
+
+Here the algorithm detected 4 change points with statistical test showing that 
behavior of the time series changes at them. In other words, data have 
different distribution to the left and to the right of each change point.
+
+## Technical Details
+### Main Idea
+The main idea is to use a divergence measure between distributions to identify 
potential points in time series at which the characteristics of the time series 
changed. Namely, having a time series $$Z_1, \cdots, Z_T$$ (which may be 
multidimensional, i.e. from $$\mathbb{R}^d$$ with $$d\geq1$$) we are testing 
subsequences $$X_\tau = \{ Z_1, Z_2, \cdots, Z_\tau \}$$ and 
$$Y_\tau(\kappa)=\{ Z_{\tau+1}, Z_{\tau+2}, \cdots, Z_\kappa \}$$ for all 
possible $$1 \leq \tau < \kappa \leq T$$ to find such $$\hat{\tau}, 
\hat{\kappa}$$ (called candidates) that maximize the probability that 
$$X_\tau$$ and $$Y_\tau(\kappa)$$ come from different distributions. If the 
probability for the best found $$\hat{\tau}, \hat{\kappa}$$ is above a certain 
threshold, then candidate $$\hat{\tau}$$ is a change point. The process is 
repeated recursively to the left and to right of $$\hat{\tau}$$ until no 
candidate corresponds to a high enough probability. This process yields a 
series of change points $$0 < \hat{\ta
 u}_1 < \hat{\tau}_2 < \cdots < \hat{\tau}_k < T$$.
+

Review Comment:
   Added more explanation and figures



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to