YuriyKrasilnikov opened a new issue, #37507: URL: https://github.com/apache/superset/issues/37507
### Is your feature request related to a problem? When a dashboard chart loads slowly, there is no way to determine where the time is spent within Superset. The database query log may show a fast query (e.g. 200ms), yet the chart takes 10+ seconds to load. The overhead could be in Jinja template rendering, security/RLS checks, query compilation, result serialization, or connection pool wait time. Without phase-level breakdown, debugging slow dashboards is guesswork. In containerized deployments, a Celery worker can become unresponsive during a long-running phase, causing liveness probe failures and container restarts, while the root cause remains invisible. Other BI tools solve this: [Looker provides a 3-phase breakdown](https://cloud.google.com/looker/docs/query-performance-metrics) (Initialization / Running Query / Processing Results) with a [Performance Panel](https://cloud.google.com/looker/docs/query-tracker) in the UI. Metabase and Redash provide only total execution time without phase-level detail. Related discussion: #18431, #13044 ### Describe the solution you'd like Instrument the `/api/v1/chart/data` query lifecycle to collect per-phase timing and include it in the API response. **Instrumentation points** (~6-8 `stats_timing` wrappers in existing code): ``` query_context_processor.py:get_df_payload() ├─ cache lookup → stats_timing("chart_data.cache_lookup") ├─ jinja template render → stats_timing("chart_data.jinja_rendering") ├─ security/RLS checks → stats_timing("chart_data.security_checks") ├─ database execution → stats_timing("chart_data.db_execution") └─ result serialization → stats_timing("chart_data.result_processing") ``` **API response** — add a `timing` object: ```json { "result": [...], "timing": { "cache_lookup_ms": 5, "jinja_rendering_ms": 120, "security_checks_ms": 45, "db_execution_ms": 850, "result_processing_ms": 200, "total_ms": 1220 } } ``` **Logging** — configurable slow query logging (WARNING level) when total exceeds a threshold: ``` Slow chart query: chart_id=15, dashboard_id=7, total=8500ms jinja=2100ms, security=45ms, db=850ms, serialization=5500ms ``` **Metrics** — emit per-phase timing via existing `STATS_LOGGER` (StatsD/Prometheus). All existing infrastructure (`STATS_LOGGER`, `stats_timing` context manager, `@statsd_metrics` decorator) is reused. Minimal overhead — timestamp calls only. ### Describe alternatives you've considered - **External APM (Datadog, New Relic)** — requires additional infrastructure, not accessible to analysts, does not provide Superset-specific phase names - **OpenTelemetry instrumentation** — more generic, better for DevOps but not analyst-friendly; could complement this feature - **Browser DevTools Network tab** — shows total API time but no server-side breakdown ### Additional context Scope is limited to backend instrumentation and API response. Frontend visualization (performance panel in UI) is out of scope for this issue. ### Checklist - [x] I have searched Superset docs and Slack and didn't find a solution to my problem. - [x] I have searched the GitHub issue tracker and didn't find a similar bug report. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
