Re: [PR] feat(dashboard): export dashboard chart data to Excel (async, S3 + email) [superset]

via GitHub Wed, 24 Jun 2026 18:17:50 -0700


codeant-ai-for-open-source[bot] commented on code in PR #41133:
URL: https://github.com/apache/superset/pull/41133#discussion_r3471266156



##########
superset/tasks/export_dashboard_excel.py:
##########
@@ -0,0 +1,246 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Celery task that exports every chart on a dashboard to a single multi-sheet
+``.xlsx`` file, uploads it to S3, and emails the requesting user a pre-signed
+download link.
+
+The task re-runs each chart's saved query context under the requesting user,
+applies the live dashboard filter state, and streams the results row-by-row 
into
+a constant-memory workbook so large dashboards never load all data at once.
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+import tempfile
+from datetime import datetime, timedelta
+from typing import Any
+
+from celery.exceptions import SoftTimeLimitExceeded
+from flask import current_app, g
+
+from superset import db, security_manager
+from superset.charts.data.dashboard_filter_context import (
+    apply_dashboard_filter_context,
+    get_dashboard_filter_context,
+)
+from superset.charts.schemas import ChartDataQueryContextSchema
+from superset.commands.chart.data.get_data_command import ChartDataCommand
+from superset.common.chart_data import ChartDataResultFormat, 
ChartDataResultType
+from superset.dashboards.excel_export import email
+from superset.dashboards.excel_export.layout import get_charts_in_layout_order
+from superset.extensions import celery_app
+from superset.utils import json, s3
+from superset.utils.core import override_user
+from superset.utils.excel_streaming import StreamingXlsxWriter
+
+logger = logging.getLogger(__name__)
+
+
+def _chart_label(chart: Any) -> str:
+    """Human-readable label for a chart in the skipped-charts list."""
+    return f"{chart.id} - {chart.slice_name or ''}".strip()
+
+
+def _record_to_row(record: dict[str, Any], colnames: list[str]) -> list[Any]:
+    return [record.get(col) for col in colnames]
+
+
+def _write_chart_sheets(
+    writer: StreamingXlsxWriter,
+    chart: Any,
+    dashboard_id: int,
+    active_data_mask: dict[str, Any],
+) -> None:
+    """
+    Run a single chart's query and stream its result(s) into the workbook.
+
+    Charts may yield more than one query (e.g. mixed-series charts); each 
becomes
+    its own sheet. Raises if the chart cannot be exported, so the caller can 
skip
+    it and note it in the email.
+    """
+    json_body = json.loads(chart.query_context)
+    # Override any stale saved values: we always want full JSON results.
+    json_body["result_format"] = ChartDataResultFormat.JSON
+    json_body["result_type"] = ChartDataResultType.FULL
+    json_body.pop("force", None)
+
+    filter_context = get_dashboard_filter_context(
+        dashboard_id=dashboard_id,
+        chart_id=chart.id,
+        active_data_mask=active_data_mask,
+    )
+    if filter_context.extra_form_data:
+        apply_dashboard_filter_context(json_body, 
filter_context.extra_form_data)
+
+    # Jinja macros resolve form data from g.form_data; expose the saved 
context.
+    g.form_data = json_body
+
+    query_context = ChartDataQueryContextSchema().load(json_body)
+    command = ChartDataCommand(query_context)
+    command.validate()
+    result = command.run()
+
+    for index, query in enumerate(result["queries"]):
+        colnames = query.get("colnames") or []
+        data = query.get("data") or []
+        if index == 0:
+            name = f"{chart.id} - {chart.slice_name or ''}"
+        else:
+            name = f"{chart.id}.{index} - {chart.slice_name or ''}"
+        writer.add_sheet(
+            name,
+            colnames,
+            (_record_to_row(record, colnames) for record in data),
+        )
+
+
+def _build_workbook(
+    path: str,
+    dashboard: Any,
+    active_data_mask: dict[str, Any],
+    job_id: str,
+) -> list[str]:
+    """Build the workbook on disk; return the list of skipped chart labels."""
+    skipped: list[str] = []
+    writer = StreamingXlsxWriter(path)
+    try:
+        for chart in get_charts_in_layout_order(dashboard):
+            if not chart.query_context:
+                skipped.append(_chart_label(chart))
+                continue
+            try:
+                _write_chart_sheets(writer, chart, dashboard.id, 
active_data_mask)
+            except Exception:  # pylint: disable=broad-except
+                logger.exception(
+                    "Skipping chart %s in dashboard export %s", chart.id, 
job_id
+                )
+                skipped.append(_chart_label(chart))

Review Comment:
   **Suggestion:** This catch-all marks every chart exception as a generic 
"skipped" chart, but the success email text says skipped charts were omitted 
because they had no saved query context. That creates incorrect user-facing 
diagnostics when a chart actually failed due to query/runtime/access errors. 
Track skip reasons separately (e.g., no context vs execution failure) and 
surface the correct reason in the email. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Success emails misreport why charts were omitted.
   - ⚠️ Users receive incorrect guidance for fixing missing data.
   - ⚠️ Runtime chart errors obscured behind misleading skip message.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. A client calls `POST /api/v1/dashboard/<pk>/export_xlsx/`, triggering
   `DashboardRestApi.export_xlsx` in `superset/dashboards/api.py:1398-88`, 
which validates
   the request, checks dashboard access at lines 61-67, verifies the dashboard 
has slices at
   lines 75-76, and enqueues the `export_dashboard_excel` Celery task at lines 
78-87.
   
   2. The Celery worker runs `export_dashboard_excel` in
   `superset/tasks/export_dashboard_excel.py:168-243`, enters 
`_build_workbook(tmp_path,
   dashboard, active_data_mask, job_id)` at line 205, and `_build_workbook` 
iterates charts
   via `get_charts_in_layout_order(dashboard)` at line 124, skipping only those 
with empty
   `chart.query_context` at lines 125-127.
   
   3. For each chart with a non-empty `query_context`, `_build_workbook` calls
   `_write_chart_sheets(writer, chart, dashboard.id, active_data_mask)` at line 
129; if
   `_write_chart_sheets` raises any exception (for example, from 
`ChartDataCommand.run()` at
   line 98 due to a query or access error), the broad `except Exception` block 
at lines
   130-134 logs the error and appends `_chart_label(chart)` to the `skipped` 
list, but does
   not re-raise, so the task continues and the export is treated as a success 
even if all
   charts failed.
   
   4. After building and uploading the workbook, `export_dashboard_excel` sends 
a success
   email via `email.build_success_email(..., skipped_charts=skipped)` at lines 
220-229;
   `build_success_email` in `superset/dashboards/excel_export/email.py:65-88` 
calls
   `_skipped_section(skipped_charts)` at lines 54-62, which always renders the 
message "the
   following charts were omitted because they have no saved query context" for 
each label in
   `skipped_charts`, so any chart that actually failed due to 
runtime/query/access errors is
   misreported to the user as lacking saved query context.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=5d7550373aea4e839e17f6bcf6058049&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=5d7550373aea4e839e17f6bcf6058049&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/tasks/export_dashboard_excel.py
   **Line:** 128:134
   **Comment:**
        *Logic Error: This catch-all marks every chart exception as a generic 
"skipped" chart, but the success email text says skipped charts were omitted 
because they had no saved query context. That creates incorrect user-facing 
diagnostics when a chart actually failed due to query/runtime/access errors. 
Track skip reasons separately (e.g., no context vs execution failure) and 
surface the correct reason in the email.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41133&comment_hash=d29b7a6fa4bf0808d508aceb65512acd4dc4f2fa896db84e964f2839e4311ad7&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41133&comment_hash=d29b7a6fa4bf0808d508aceb65512acd4dc4f2fa896db84e964f2839e4311ad7&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(dashboard): export dashboard chart data to Excel (async, S3 + email) [superset]

Reply via email to