codeant-ai-for-open-source[bot] commented on code in PR #41133: URL: https://github.com/apache/superset/pull/41133#discussion_r3436918327
########## superset/tasks/export_dashboard_excel.py: ########## @@ -0,0 +1,247 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +""" +Celery task that exports every chart on a dashboard to a single multi-sheet +``.xlsx`` file, uploads it to S3, and emails the requesting user a pre-signed +download link. + +The task re-runs each chart's saved query context under the requesting user, +applies the live dashboard filter state, and streams the results row-by-row into +a constant-memory workbook so large dashboards never load all data at once. +""" + +from __future__ import annotations + +import logging +import os +import tempfile +from datetime import datetime, timedelta +from typing import Any + +from celery.exceptions import SoftTimeLimitExceeded +from flask import current_app, g + +from superset import db, security_manager +from superset.charts.data.dashboard_filter_context import ( + apply_extra_form_data_to_query_context_json, + get_dashboard_filter_context, +) +from superset.charts.schemas import ChartDataQueryContextSchema +from superset.commands.chart.data.get_data_command import ChartDataCommand +from superset.common.chart_data import ChartDataResultFormat, ChartDataResultType +from superset.dashboards.excel_export import email +from superset.dashboards.excel_export.layout import get_charts_in_layout_order +from superset.extensions import celery_app +from superset.utils import json, s3 +from superset.utils.core import override_user +from superset.utils.excel_streaming import StreamingXlsxWriter + +logger = logging.getLogger(__name__) + + +def _chart_label(chart: Any) -> str: + """Human-readable label for a chart in the skipped-charts list.""" + return f"{chart.id} - {chart.slice_name or ''}".strip() + + +def _record_to_row(record: dict[str, Any], colnames: list[str]) -> list[Any]: + return [record.get(col) for col in colnames] + + +def _write_chart_sheets( + writer: StreamingXlsxWriter, + chart: Any, + dashboard_id: int, + active_data_mask: dict[str, Any], +) -> None: + """ + Run a single chart's query and stream its result(s) into the workbook. + + Charts may yield more than one query (e.g. mixed-series charts); each becomes + its own sheet. Raises if the chart cannot be exported, so the caller can skip + it and note it in the email. + """ + json_body = json.loads(chart.query_context) + # Override any stale saved values: we always want full JSON results. + json_body["result_format"] = ChartDataResultFormat.JSON + json_body["result_type"] = ChartDataResultType.FULL + json_body.pop("force", None) + + filter_context = get_dashboard_filter_context( + dashboard_id=dashboard_id, + chart_id=chart.id, + active_data_mask=active_data_mask, + ) + apply_extra_form_data_to_query_context_json( + json_body, filter_context.extra_form_data + ) + + # Jinja macros resolve form data from g.form_data; expose the saved context. + g.form_data = json_body + + query_context = ChartDataQueryContextSchema().load(json_body) + command = ChartDataCommand(query_context) + command.validate() + result = command.run() + + for index, query in enumerate(result["queries"]): + colnames = query.get("colnames") or [] + data = query.get("data") or [] + if index == 0: + name = f"{chart.id} - {chart.slice_name or ''}" + else: + name = f"{chart.id}.{index} - {chart.slice_name or ''}" + writer.add_sheet( + name, + colnames, + (_record_to_row(record, colnames) for record in data), + ) + + +def _build_workbook( + path: str, + dashboard: Any, + active_data_mask: dict[str, Any], + job_id: str, +) -> list[str]: + """Build the workbook on disk; return the list of skipped chart labels.""" + skipped: list[str] = [] + writer = StreamingXlsxWriter(path) + try: + for chart in get_charts_in_layout_order(dashboard): + if not chart.query_context: + skipped.append(_chart_label(chart)) + continue + try: + _write_chart_sheets(writer, chart, dashboard.id, active_data_mask) + except Exception: # pylint: disable=broad-except + logger.exception( + "Skipping chart %s in dashboard export %s", chart.id, job_id + ) + skipped.append(_chart_label(chart)) Review Comment: **Suggestion:** The broad per-chart `except Exception` also catches `SoftTimeLimitExceeded`, so Celery soft timeouts are swallowed and the task keeps running until hard-killed. Re-raise `SoftTimeLimitExceeded` inside this block (or catch it separately before the broad handler) so timeout handling and failure-email logic in the outer task can execute. [logic error] <details> <summary><b>Severity Level:</b> Major ⚠️</summary> ```mdx - ❌ Celery soft timeouts swallowed inside per-chart export loop. - ⚠️ Timeout failure emails never sent for long-running exports. ``` </details> <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Trigger the dashboard Excel export endpoint `export_xlsx` in `superset/dashboards/api.py` (lines 6-55 of the shown snippet) by sending `POST /api/v1/dashboard/<pk>/export_xlsx/` for a dashboard with many or slow charts so that processing can exceed the Celery soft time limit. 2. The view calls `export_dashboard_excel.apply_async(...)` (in `superset/dashboards/api.py`, lines 45-54), which enqueues the Celery task `export_dashboard_excel` defined in `superset/tasks/export_dashboard_excel.py` at lines 162-169 with `soft_time_limit=600` and `time_limit=660`. 3. In the worker, `export_dashboard_excel` loads the dashboard and calls `_build_workbook(tmp_path, dashboard, active_data_mask, job_id)` (lines 192-207 in `superset/tasks/export_dashboard_excel.py`), which loops through charts and calls `_write_chart_sheets(writer, chart, dashboard.id, active_data_mask)` (line 130) for each chart. 4. When the soft time limit is exceeded while executing `_write_chart_sheets` or its internals (e.g. `ChartDataCommand.run()` at lines 96-99), Celery raises `SoftTimeLimitExceeded`, but this exception is caught by the broad `except Exception` in `_build_workbook` (lines 131-135) and treated as a skipped chart instead of being re-raised, so the outer `except SoftTimeLimitExceeded` in `export_dashboard_excel` (lines 237-240) never executes and the timeout-specific failure email path `_send_failure_email` (lines 147-159) is not triggered. ``` </details> [](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=8ba1dcdc111b4a2d9b1e22668a99649f&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset) [](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=8ba1dcdc111b4a2d9b1e22668a99649f&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset) *(Use Cmd/Ctrl + Click for best experience)* <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/tasks/export_dashboard_excel.py **Line:** 129:135 **Comment:** *Logic Error: The broad per-chart `except Exception` also catches `SoftTimeLimitExceeded`, so Celery soft timeouts are swallowed and the task keeps running until hard-killed. Re-raise `SoftTimeLimitExceeded` inside this block (or catch it separately before the broad handler) so timeout handling and failure-email logic in the outer task can execute. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix ``` </details> <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41133&comment_hash=7d367e7a26d8effef70e9e3fb7aaa800a7a38b439bed59b18ef913f39ab08193&reaction=like'>👍</a> | <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41133&comment_hash=7d367e7a26d8effef70e9e3fb7aaa800a7a38b439bed59b18ef913f39ab08193&reaction=dislike'>👎</a> ########## superset/tasks/export_dashboard_excel.py: ########## @@ -0,0 +1,247 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +""" +Celery task that exports every chart on a dashboard to a single multi-sheet +``.xlsx`` file, uploads it to S3, and emails the requesting user a pre-signed +download link. + +The task re-runs each chart's saved query context under the requesting user, +applies the live dashboard filter state, and streams the results row-by-row into +a constant-memory workbook so large dashboards never load all data at once. +""" + +from __future__ import annotations + +import logging +import os +import tempfile +from datetime import datetime, timedelta +from typing import Any + +from celery.exceptions import SoftTimeLimitExceeded +from flask import current_app, g + +from superset import db, security_manager +from superset.charts.data.dashboard_filter_context import ( + apply_extra_form_data_to_query_context_json, + get_dashboard_filter_context, +) +from superset.charts.schemas import ChartDataQueryContextSchema +from superset.commands.chart.data.get_data_command import ChartDataCommand +from superset.common.chart_data import ChartDataResultFormat, ChartDataResultType +from superset.dashboards.excel_export import email +from superset.dashboards.excel_export.layout import get_charts_in_layout_order +from superset.extensions import celery_app +from superset.utils import json, s3 +from superset.utils.core import override_user +from superset.utils.excel_streaming import StreamingXlsxWriter + +logger = logging.getLogger(__name__) + + +def _chart_label(chart: Any) -> str: + """Human-readable label for a chart in the skipped-charts list.""" + return f"{chart.id} - {chart.slice_name or ''}".strip() + + +def _record_to_row(record: dict[str, Any], colnames: list[str]) -> list[Any]: + return [record.get(col) for col in colnames] + + +def _write_chart_sheets( + writer: StreamingXlsxWriter, + chart: Any, + dashboard_id: int, + active_data_mask: dict[str, Any], +) -> None: + """ + Run a single chart's query and stream its result(s) into the workbook. + + Charts may yield more than one query (e.g. mixed-series charts); each becomes + its own sheet. Raises if the chart cannot be exported, so the caller can skip + it and note it in the email. + """ + json_body = json.loads(chart.query_context) + # Override any stale saved values: we always want full JSON results. + json_body["result_format"] = ChartDataResultFormat.JSON + json_body["result_type"] = ChartDataResultType.FULL + json_body.pop("force", None) + + filter_context = get_dashboard_filter_context( + dashboard_id=dashboard_id, + chart_id=chart.id, + active_data_mask=active_data_mask, + ) Review Comment: **Suggestion:** `get_dashboard_filter_context` is called once per chart and internally reloads the dashboard and re-runs access validation each time, creating an avoidable N+1 pattern on large dashboards. Compute shared dashboard filter metadata once per export (or pass preloaded dashboard context) to avoid repeated DB and security checks. [performance] <details> <summary><b>Severity Level:</b> Major ⚠️</summary> ```mdx - ⚠️ Each chart recomputes dashboard filters and access checks. - ⚠️ Large dashboard exports execute redundant dashboard database queries. ``` </details> <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Invoke the dashboard Excel export endpoint `export_xlsx` in `superset/dashboards/api.py` (lines 6-55 of the snippet) via `POST /api/v1/dashboard/<pk>/export_xlsx/`, which schedules the `export_dashboard_excel` Celery task (lines 45-54). 2. In `superset/tasks/export_dashboard_excel.py`, `export_dashboard_excel` loads the `Dashboard` once from the database with `db.session.query(Dashboard).filter_by(id=dashboard_id).one_or_none()` (lines 193-196) before calling `_build_workbook(tmp_path, dashboard, active_data_mask, job_id)` (line 206). 3. `_build_workbook` (lines 115-144) iterates through all charts in layout order using `get_charts_in_layout_order(dashboard)` (line 125) and for each chart with a saved `query_context` calls `_write_chart_sheets(writer, chart, dashboard.id, active_data_mask)` (line 130). 4. Inside `_write_chart_sheets` (lines 65-88), every chart export calls `get_dashboard_filter_context(dashboard_id=dashboard_id, chart_id=chart.id, active_data_mask=active_data_mask)` (lines 84-88); the implementation in `superset/charts/data/dashboard_filter_context.py` (lines 47-55) re-queries the `Dashboard` from `db.session` and re-runs `_check_dashboard_access(dashboard)`/`security_manager.raise_for_access` for each chart, so exporting a dashboard with N charts performs N extra dashboard queries and access checks even though the same dashboard was already loaded once in `export_dashboard_excel`. ``` </details> [](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=c571de4bcffc40dfbd42314aac6f81b4&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset) [](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=c571de4bcffc40dfbd42314aac6f81b4&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset) *(Use Cmd/Ctrl + Click for best experience)* <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/tasks/export_dashboard_excel.py **Line:** 84:88 **Comment:** *Performance: `get_dashboard_filter_context` is called once per chart and internally reloads the dashboard and re-runs access validation each time, creating an avoidable N+1 pattern on large dashboards. Compute shared dashboard filter metadata once per export (or pass preloaded dashboard context) to avoid repeated DB and security checks. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix ``` </details> <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41133&comment_hash=b91322ae8610d63b799dac4080380825156b852e5b6215a243d17d057db6d697&reaction=like'>👍</a> | <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41133&comment_hash=b91322ae8610d63b799dac4080380825156b852e5b6215a243d17d057db6d697&reaction=dislike'>👎</a> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
