Re: [PR] Replacing signal_processing_algorithms with internal implementation [otava]

via GitHub Mon, 17 Nov 2025 13:18:09 -0800


henrikingo commented on PR #96:
URL: https://github.com/apache/otava/pull/96#issuecomment-3543878090


   > > I think to generate a data set that the hunter paper was concerned with, 
you need the drop to be short, maybe even 1-2 only:
   > > ```
   > >  drop = 400 + np.random.randn(2) * 5
   > > ```
   > 
   > Correct me If I'm wrong, but my understanding was that there are two 
separate problems:
   > 
   > 1. Disappearing of previous found critical points.
   > 2. Not detecting the critical points in the first place (because the 
number of abnormal points is small)
   
   No, these are the same problem. The change points disappear when the 
interval/ window they are in, grows larger. I always assumed this was a 
feature: In a short timeseries, say 50-100 points, MongoDB e-divisive with 
typical parameters would ignore spikes that last a single point only, and might 
alert for a plateu of 2-3 points that then returns to the original level. (But 
even then would only produce 1 change point, because original MongoDB 
implementation needed a hard coded 3 points before it would alert anything at 
all, so it is not possible to find 2 neighboring change points. This is from 
the Matteson paper and their R reference implementation I believe defaulted  to 
a leading 30 points or so. Which would be a long time to wait for a jira ticket 
if it was nightly builds!)
   
   ...where was I... So then if the series keeps growing , my interpretation is 
that the short lived change becomes less significant compared to the entire 
series, so eventually it is ignored by the algorithm, just as if it was a 
single point. Conversely, also a single point could trigger an alert  if it was 
large enough. (At least assuming that the series on both of its sides aren't 
perfectly constant.)
   
   The fix of adding a window is based on the above understanding: it creates a 
situation where the local computation doesn't take into account more than a 
small number of local points.
   
   And this is why I asked earlier whether Kappa is now equivalent to observing 
a series grow from 1 point and computing the algorithm for every added point.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Replacing signal_processing_algorithms with internal implementation [otava]

Reply via email to