Re: [PR] feat(dashboard): export dashboard chart data to Excel (async, S3 + email) [superset]

via GitHub Thu, 18 Jun 2026 08:15:27 -0700


codeant-ai-for-open-source[bot] commented on code in PR #41133:
URL: https://github.com/apache/superset/pull/41133#discussion_r3436918327



##########
superset/tasks/export_dashboard_excel.py:
##########
@@ -0,0 +1,247 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Celery task that exports every chart on a dashboard to a single multi-sheet
+``.xlsx`` file, uploads it to S3, and emails the requesting user a pre-signed
+download link.
+
+The task re-runs each chart's saved query context under the requesting user,
+applies the live dashboard filter state, and streams the results row-by-row 
into
+a constant-memory workbook so large dashboards never load all data at once.
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+import tempfile
+from datetime import datetime, timedelta
+from typing import Any
+
+from celery.exceptions import SoftTimeLimitExceeded
+from flask import current_app, g
+
+from superset import db, security_manager
+from superset.charts.data.dashboard_filter_context import (
+    apply_extra_form_data_to_query_context_json,
+    get_dashboard_filter_context,
+)
+from superset.charts.schemas import ChartDataQueryContextSchema
+from superset.commands.chart.data.get_data_command import ChartDataCommand
+from superset.common.chart_data import ChartDataResultFormat, 
ChartDataResultType
+from superset.dashboards.excel_export import email
+from superset.dashboards.excel_export.layout import get_charts_in_layout_order
+from superset.extensions import celery_app
+from superset.utils import json, s3
+from superset.utils.core import override_user
+from superset.utils.excel_streaming import StreamingXlsxWriter
+
+logger = logging.getLogger(__name__)
+
+
+def _chart_label(chart: Any) -> str:
+    """Human-readable label for a chart in the skipped-charts list."""
+    return f"{chart.id} - {chart.slice_name or ''}".strip()
+
+
+def _record_to_row(record: dict[str, Any], colnames: list[str]) -> list[Any]:
+    return [record.get(col) for col in colnames]
+
+
+def _write_chart_sheets(
+    writer: StreamingXlsxWriter,
+    chart: Any,
+    dashboard_id: int,
+    active_data_mask: dict[str, Any],
+) -> None:
+    """
+    Run a single chart's query and stream its result(s) into the workbook.
+
+    Charts may yield more than one query (e.g. mixed-series charts); each 
becomes
+    its own sheet. Raises if the chart cannot be exported, so the caller can 
skip
+    it and note it in the email.
+    """
+    json_body = json.loads(chart.query_context)
+    # Override any stale saved values: we always want full JSON results.
+    json_body["result_format"] = ChartDataResultFormat.JSON
+    json_body["result_type"] = ChartDataResultType.FULL
+    json_body.pop("force", None)
+
+    filter_context = get_dashboard_filter_context(
+        dashboard_id=dashboard_id,
+        chart_id=chart.id,
+        active_data_mask=active_data_mask,
+    )
+    apply_extra_form_data_to_query_context_json(
+        json_body, filter_context.extra_form_data
+    )
+
+    # Jinja macros resolve form data from g.form_data; expose the saved 
context.
+    g.form_data = json_body
+
+    query_context = ChartDataQueryContextSchema().load(json_body)
+    command = ChartDataCommand(query_context)
+    command.validate()
+    result = command.run()
+
+    for index, query in enumerate(result["queries"]):
+        colnames = query.get("colnames") or []
+        data = query.get("data") or []
+        if index == 0:
+            name = f"{chart.id} - {chart.slice_name or ''}"
+        else:
+            name = f"{chart.id}.{index} - {chart.slice_name or ''}"
+        writer.add_sheet(
+            name,
+            colnames,
+            (_record_to_row(record, colnames) for record in data),
+        )
+
+
+def _build_workbook(
+    path: str,
+    dashboard: Any,
+    active_data_mask: dict[str, Any],
+    job_id: str,
+) -> list[str]:
+    """Build the workbook on disk; return the list of skipped chart labels."""
+    skipped: list[str] = []
+    writer = StreamingXlsxWriter(path)
+    try:
+        for chart in get_charts_in_layout_order(dashboard):
+            if not chart.query_context:
+                skipped.append(_chart_label(chart))
+                continue
+            try:
+                _write_chart_sheets(writer, chart, dashboard.id, 
active_data_mask)
+            except Exception:  # pylint: disable=broad-except
+                logger.exception(
+                    "Skipping chart %s in dashboard export %s", chart.id, 
job_id
+                )
+                skipped.append(_chart_label(chart))

Review Comment:
   **Suggestion:** The broad per-chart `except Exception` also catches 
`SoftTimeLimitExceeded`, so Celery soft timeouts are swallowed and the task 
keeps running until hard-killed. Re-raise `SoftTimeLimitExceeded` inside this 
block (or catch it separately before the broad handler) so timeout handling and 
failure-email logic in the outer task can execute. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Celery soft timeouts swallowed inside per-chart export loop.
   - ⚠️ Timeout failure emails never sent for long-running exports.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Trigger the dashboard Excel export endpoint `export_xlsx` in
   `superset/dashboards/api.py` (lines 6-55 of the shown snippet) by sending 
`POST
   /api/v1/dashboard/<pk>/export_xlsx/` for a dashboard with many or slow 
charts so that
   processing can exceed the Celery soft time limit.
   
   2. The view calls `export_dashboard_excel.apply_async(...)` (in
   `superset/dashboards/api.py`, lines 45-54), which enqueues the Celery task
   `export_dashboard_excel` defined in 
`superset/tasks/export_dashboard_excel.py` at lines
   162-169 with `soft_time_limit=600` and `time_limit=660`.
   
   3. In the worker, `export_dashboard_excel` loads the dashboard and calls
   `_build_workbook(tmp_path, dashboard, active_data_mask, job_id)` (lines 
192-207 in
   `superset/tasks/export_dashboard_excel.py`), which loops through charts and 
calls
   `_write_chart_sheets(writer, chart, dashboard.id, active_data_mask)` (line 
130) for each
   chart.
   
   4. When the soft time limit is exceeded while executing 
`_write_chart_sheets` or its
   internals (e.g. `ChartDataCommand.run()` at lines 96-99), Celery raises
   `SoftTimeLimitExceeded`, but this exception is caught by the broad `except 
Exception` in
   `_build_workbook` (lines 131-135) and treated as a skipped chart instead of 
being
   re-raised, so the outer `except SoftTimeLimitExceeded` in 
`export_dashboard_excel` (lines
   237-240) never executes and the timeout-specific failure email path 
`_send_failure_email`
   (lines 147-159) is not triggered.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=8ba1dcdc111b4a2d9b1e22668a99649f&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=8ba1dcdc111b4a2d9b1e22668a99649f&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/tasks/export_dashboard_excel.py
   **Line:** 129:135
   **Comment:**
        *Logic Error: The broad per-chart `except Exception` also catches 
`SoftTimeLimitExceeded`, so Celery soft timeouts are swallowed and the task 
keeps running until hard-killed. Re-raise `SoftTimeLimitExceeded` inside this 
block (or catch it separately before the broad handler) so timeout handling and 
failure-email logic in the outer task can execute.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41133&comment_hash=7d367e7a26d8effef70e9e3fb7aaa800a7a38b439bed59b18ef913f39ab08193&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41133&comment_hash=7d367e7a26d8effef70e9e3fb7aaa800a7a38b439bed59b18ef913f39ab08193&reaction=dislike'>👎</a>



##########
superset/tasks/export_dashboard_excel.py:
##########
@@ -0,0 +1,247 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Celery task that exports every chart on a dashboard to a single multi-sheet
+``.xlsx`` file, uploads it to S3, and emails the requesting user a pre-signed
+download link.
+
+The task re-runs each chart's saved query context under the requesting user,
+applies the live dashboard filter state, and streams the results row-by-row 
into
+a constant-memory workbook so large dashboards never load all data at once.
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+import tempfile
+from datetime import datetime, timedelta
+from typing import Any
+
+from celery.exceptions import SoftTimeLimitExceeded
+from flask import current_app, g
+
+from superset import db, security_manager
+from superset.charts.data.dashboard_filter_context import (
+    apply_extra_form_data_to_query_context_json,
+    get_dashboard_filter_context,
+)
+from superset.charts.schemas import ChartDataQueryContextSchema
+from superset.commands.chart.data.get_data_command import ChartDataCommand
+from superset.common.chart_data import ChartDataResultFormat, 
ChartDataResultType
+from superset.dashboards.excel_export import email
+from superset.dashboards.excel_export.layout import get_charts_in_layout_order
+from superset.extensions import celery_app
+from superset.utils import json, s3
+from superset.utils.core import override_user
+from superset.utils.excel_streaming import StreamingXlsxWriter
+
+logger = logging.getLogger(__name__)
+
+
+def _chart_label(chart: Any) -> str:
+    """Human-readable label for a chart in the skipped-charts list."""
+    return f"{chart.id} - {chart.slice_name or ''}".strip()
+
+
+def _record_to_row(record: dict[str, Any], colnames: list[str]) -> list[Any]:
+    return [record.get(col) for col in colnames]
+
+
+def _write_chart_sheets(
+    writer: StreamingXlsxWriter,
+    chart: Any,
+    dashboard_id: int,
+    active_data_mask: dict[str, Any],
+) -> None:
+    """
+    Run a single chart's query and stream its result(s) into the workbook.
+
+    Charts may yield more than one query (e.g. mixed-series charts); each 
becomes
+    its own sheet. Raises if the chart cannot be exported, so the caller can 
skip
+    it and note it in the email.
+    """
+    json_body = json.loads(chart.query_context)
+    # Override any stale saved values: we always want full JSON results.
+    json_body["result_format"] = ChartDataResultFormat.JSON
+    json_body["result_type"] = ChartDataResultType.FULL
+    json_body.pop("force", None)
+
+    filter_context = get_dashboard_filter_context(
+        dashboard_id=dashboard_id,
+        chart_id=chart.id,
+        active_data_mask=active_data_mask,
+    )

Review Comment:
   **Suggestion:** `get_dashboard_filter_context` is called once per chart and 
internally reloads the dashboard and re-runs access validation each time, 
creating an avoidable N+1 pattern on large dashboards. Compute shared dashboard 
filter metadata once per export (or pass preloaded dashboard context) to avoid 
repeated DB and security checks. [performance]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Each chart recomputes dashboard filters and access checks.
   - ⚠️ Large dashboard exports execute redundant dashboard database queries.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Invoke the dashboard Excel export endpoint `export_xlsx` in
   `superset/dashboards/api.py` (lines 6-55 of the snippet) via `POST
   /api/v1/dashboard/<pk>/export_xlsx/`, which schedules the 
`export_dashboard_excel` Celery
   task (lines 45-54).
   
   2. In `superset/tasks/export_dashboard_excel.py`, `export_dashboard_excel` 
loads the
   `Dashboard` once from the database with
   `db.session.query(Dashboard).filter_by(id=dashboard_id).one_or_none()` 
(lines 193-196)
   before calling `_build_workbook(tmp_path, dashboard, active_data_mask, 
job_id)` (line
   206).
   
   3. `_build_workbook` (lines 115-144) iterates through all charts in layout 
order using
   `get_charts_in_layout_order(dashboard)` (line 125) and for each chart with a 
saved
   `query_context` calls `_write_chart_sheets(writer, chart, dashboard.id, 
active_data_mask)`
   (line 130).
   
   4. Inside `_write_chart_sheets` (lines 65-88), every chart export calls
   `get_dashboard_filter_context(dashboard_id=dashboard_id, chart_id=chart.id,
   active_data_mask=active_data_mask)` (lines 84-88); the implementation in
   `superset/charts/data/dashboard_filter_context.py` (lines 47-55) re-queries 
the
   `Dashboard` from `db.session` and re-runs
   `_check_dashboard_access(dashboard)`/`security_manager.raise_for_access` for 
each chart,
   so exporting a dashboard with N charts performs N extra dashboard queries 
and access
   checks even though the same dashboard was already loaded once in 
`export_dashboard_excel`.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=c571de4bcffc40dfbd42314aac6f81b4&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=c571de4bcffc40dfbd42314aac6f81b4&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/tasks/export_dashboard_excel.py
   **Line:** 84:88
   **Comment:**
        *Performance: `get_dashboard_filter_context` is called once per chart 
and internally reloads the dashboard and re-runs access validation each time, 
creating an avoidable N+1 pattern on large dashboards. Compute shared dashboard 
filter metadata once per export (or pass preloaded dashboard context) to avoid 
repeated DB and security checks.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41133&comment_hash=b91322ae8610d63b799dac4080380825156b852e5b6215a243d17d057db6d697&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41133&comment_hash=b91322ae8610d63b799dac4080380825156b852e5b6215a243d17d057db6d697&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(dashboard): export dashboard chart data to Excel (async, S3 + email) [superset]

Reply via email to