YuriyKrasilnikov opened a new issue, #37507:
URL: https://github.com/apache/superset/issues/37507

   ### Is your feature request related to a problem?
   
   When a dashboard chart loads slowly, there is no way to determine where the 
time is spent within Superset. The database query log may show a fast query 
(e.g. 200ms), yet the chart takes 10+ seconds to load. The overhead could be in 
Jinja template rendering, security/RLS checks, query compilation, result 
serialization, or connection pool wait time.
   
   Without phase-level breakdown, debugging slow dashboards is guesswork. In 
containerized deployments, a Celery worker can become unresponsive during a 
long-running phase, causing liveness probe failures and container restarts, 
while the root cause remains invisible.
   
   Other BI tools solve this: [Looker provides a 3-phase 
breakdown](https://cloud.google.com/looker/docs/query-performance-metrics) 
(Initialization / Running Query / Processing Results) with a [Performance 
Panel](https://cloud.google.com/looker/docs/query-tracker) in the UI. Metabase 
and Redash provide only total execution time without phase-level detail.
   
   Related discussion: #18431, #13044
   
   ### Describe the solution you'd like
   
   Instrument the `/api/v1/chart/data` query lifecycle to collect per-phase 
timing and include it in the API response.
   
   **Instrumentation points** (~6-8 `stats_timing` wrappers in existing code):
   
   ```
   query_context_processor.py:get_df_payload()
     ├─ cache lookup            → stats_timing("chart_data.cache_lookup")
     ├─ jinja template render   → stats_timing("chart_data.jinja_rendering")
     ├─ security/RLS checks     → stats_timing("chart_data.security_checks")
     ├─ database execution      → stats_timing("chart_data.db_execution")
     └─ result serialization    → stats_timing("chart_data.result_processing")
   ```
   
   **API response** — add a `timing` object:
   
   ```json
   {
     "result": [...],
     "timing": {
       "cache_lookup_ms": 5,
       "jinja_rendering_ms": 120,
       "security_checks_ms": 45,
       "db_execution_ms": 850,
       "result_processing_ms": 200,
       "total_ms": 1220
     }
   }
   ```
   
   **Logging** — configurable slow query logging (WARNING level) when total 
exceeds a threshold:
   
   ```
   Slow chart query: chart_id=15, dashboard_id=7, total=8500ms
     jinja=2100ms, security=45ms, db=850ms, serialization=5500ms
   ```
   
   **Metrics** — emit per-phase timing via existing `STATS_LOGGER` 
(StatsD/Prometheus).
   
   All existing infrastructure (`STATS_LOGGER`, `stats_timing` context manager, 
`@statsd_metrics` decorator) is reused. Minimal overhead — timestamp calls only.
   
   ### Describe alternatives you've considered
   
   - **External APM (Datadog, New Relic)** — requires additional 
infrastructure, not accessible to analysts, does not provide Superset-specific 
phase names
   - **OpenTelemetry instrumentation** — more generic, better for DevOps but 
not analyst-friendly; could complement this feature
   - **Browser DevTools Network tab** — shows total API time but no server-side 
breakdown
   
   ### Additional context
   
   Scope is limited to backend instrumentation and API response. Frontend 
visualization (performance panel in UI) is out of scope for this issue.
   
   ### Checklist
   
   - [x] I have searched Superset docs and Slack and didn't find a solution to 
my problem.
   - [x] I have searched the GitHub issue tracker and didn't find a similar bug 
report.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to