DeepFriedYeti opened a new issue, #40520: URL: https://github.com/apache/superset/issues/40520
*Please make sure you are familiar with the SIP process documented* [here](https://github.com/apache/superset/issues/5602). The SIP will be numbered by a committer upon acceptance. ## [SIP] Proposal for Anomaly Detection for Timeseries Charts ### Motivation Superset has forecasting but no way to visually flag anomalies. Users must export data to detect outliers externally. On management dashboards, anomalies need to be **obvious at a glance** — a revenue drop or traffic spike should stand out immediately without manual inspection. This feature adds that: red scatter points overlaid on timeseries charts, highlighting statistical outliers directly where users are already looking. It's a purely visual feature (not for alerts/reports), reducing time-to-insight on any dashboard with timeseries data. ### Proposed Change Add anomaly detection as a post-processing operation for all 6 ECharts Timeseries chart types (Line, Bar, Area, Scatter, Smooth Line, Step). The implementation follows the **exact same architectural pattern** as the existing Prophet forecasting feature: 1. **Control panel section** — "Anomaly Detection" section with enable checkbox and method-specific parameters, placed after the existing Forecast section 2. **Frontend operator** — Generates an `anomaly_detection` entry in the `post_processing` array 3. **Backend post-processing function** — Receives the DataFrame after SQL execution (and after Prophet if enabled), computes anomalies, appends `{column}__anomaly` columns 4. **Frontend rendering** — Recognizes `__anomaly` suffix columns (same pattern as `__yhat`, `__yhat_lower`, `__yhat_upper`) and renders them as red scatter points #### Detection Methods | Method | Algorithm | Best for | | ----------- | ---------------------------------------------------------------- | ----------------------------------------- | | **Z-Score** | Rolling mean/std deviation, flags points where \|z\| > threshold | General purpose, fast | | **MAD** | Rolling Median Absolute Deviation. | Data with existing outliers (more robust) | | **Prophet** | Fits Prophet model, flags points outside confidence interval | Seasonal data with trends | #### Forecast Integration When both forecast and anomaly detection are enabled, anomaly detection **automatically runs on the forecast prediction line** (`__yhat` columns). This happens without user configuration because: - Prophet's post-processing extends the DataFrame with future periods, introducing `NaN` in original columns for those dates - Anomaly detection skips columns with `NaN` values and skips confidence bounds (`__yhat_lower`, `__yhat_upper`) - The `__yhat` column has complete data for all dates and is processed automatically #### Visual Rendering - **Red scatter points** (`#FF0000`) overlaid on the original series - Minimum symbol size of 10px for clear visibility - **Excluded from legend** to avoid clutter - **Tooltip indicator** — `⚠ anomaly` shown when hovering over anomaly points - Value labels suppressed on anomaly points ### New or Changed Public Interfaces #### Backend - **New post-processing operation:** `anomaly_detection` added to `pandas_postprocessing` module - **Column naming:** Appends `{column}__anomaly` columns to the DataFrame (follows existing `__yhat` convention) #### Frontend - **New form data fields** on `EchartsTimeseriesFormData`: - `anomalyDetectionEnabled` (boolean) - `anomalyDetectionMethod` (string: `'zscore'` | `'mad'` | `'prophet'`) - `anomalyDetectionRollingWindow` (number, for zscore/mad) - `anomalyDetectionSensitivity` (number, for zscore/mad) - `anomalyDetectionConfidenceInterval` (number, for prophet) - `anomalyDetectionSeasonalityYearly/Weekly/Daily` (boolean | number | null, for prophet) - **New enum value:** `Anomaly = '__anomaly'` added to `ForecastSeriesEnum` - **New type field:** `anomaly?: number` added to `ForecastValue` - **New TypeScript type:** `PostProcessingAnomalyDetection` added to `PostProcessingRule` union #### No changes to: - REST API endpoints - Database models or configuration - CLI tools - Existing saved dashboards/charts (feature is opt-in) ### New dependencies **None.** Prophet is already an existing optional dependency used by the forecast feature. The Z-Score and MAD methods use only pandas and numpy, which are core dependencies. ### Migration Plan and Compatibility - **No database migration required** — no new models or schema changes - **Fully backward compatible** — anomaly detection is disabled by default; existing charts and dashboards are unaffected - **Saved charts** — existing saved charts will not have anomaly detection fields in their form data, which defaults to disabled (the `ANOMALY_DEFAULT_DATA` provides all defaults) - **Feature coexistence** — works independently of or alongside the existing forecast feature ### Rejected Alternatives #### 1. Frontend-only detection (JavaScript) Rejected because: - Limited to data visible in the browser (post-pagination/sampling) - Cannot leverage Prophet for seasonality-aware detection - Would create inconsistency with the forecast feature which uses backend post-processing #### 2. Separate "Anomaly Chart" visualization type Rejected because: - Anomalies are most useful as an **overlay** on existing timeseries charts - A separate chart type would force users to duplicate their chart configuration - The overlay approach matches how Prophet forecast is already rendered #### 3. Alert/Report integration Not pursued in this proposal because: - Anomaly detection's primary value is **visual** — instant recognition on dashboards - Alert integration would require additional infrastructure (thresholds, notification channels, scheduling) - Can be added as a follow-up enhancement without changing this implementation ### Samples <img width="1609" height="908" alt="Image" src="https://github.com/user-attachments/assets/b688dc62-e440-4ed3-8b9b-962743892658" /> <img width="1605" height="946" alt="Image" src="https://github.com/user-attachments/assets/782cb6d7-beec-400b-b1b8-3ead4bd4023b" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
