akshayrai opened a new pull request #5152: [TE][subscription] update subscription watermarks to use anomaly create time instead of end time URL: https://github.com/apache/incubator-pinot/pull/5152 Problem Statement: The current subscription watermarks are designed to notify an anomaly only once (even if merged) and we maintain this by keeping track of the last notified anomaly end time (watermark). However, the assumption here was that newer anomalies will always be detected on newer data (that is, newer anomalies can never have start time < watermark). This puts the restriction when dealing with backfilled data and also in the case of missing data where the actual deviation anomalies on the data might be detected later. This PR tries to remove this restriction by leveraging the anomaly create time in the watermark. Proposed changes: * Replace the anomaly end time with the anomaly create time in the vector clock. * Remove highWaterMark field from subscription config - As of today, we maintain 2 watermarks, namely the last notified anomaly ID(highWaterMark) and the anomaly end time watermark(vector clocks) to ensure that each anomaly is notified only once. The main purpose of the highWaterMark is to filter out merged anomalies from the time window. This is no longer required once we start relying on the anomaly create time.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
