codeant-ai-for-open-source[bot] commented on code in PR #40008:
URL: https://github.com/apache/superset/pull/40008#discussion_r3216453764
##########
superset/commands/dashboard/export.py:
##########
@@ -146,6 +146,27 @@ def _file_content(model: Dashboard) -> str:
if dataset:
target["datasetUuid"] = str(dataset.uuid)
+ # Replace display control dataset references with uuid.
+ # datasetId is intentionally preserved alongside datasetUuid so that
+ # bundles remain importable by older versions that do not yet
understand
+ # datasetUuid for display-control targets.
+ for customization in (
+ payload.get("metadata", {}).get("chart_customization_config") or []
+ ):
+ for target in customization.get("targets") or []:
+ dataset_id = target.get("datasetId")
+ if dataset_id is not None:
+ dataset = DatasetDAO.find_by_id(dataset_id)
+ if dataset:
+ target["datasetUuid"] = str(dataset.uuid)
+ else:
+ logger.warning(
+ "Dashboard '%s': display control target references
"
+ "missing dataset %s; datasetUuid will not be set",
+ model.dashboard_title,
+ dataset_id,
+ )
Review Comment:
**Suggestion:** Dataset UUID resolution for display controls performs a
database lookup per target (`DatasetDAO.find_by_id`) with no caching.
Dashboards with many controls targeting the same dataset will trigger redundant
repeated queries and degrade export latency. Cache lookups by dataset ID (or
prefetch in bulk) before rewriting targets. [performance]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Dashboard export issues redundant dataset lookups per target.
- ⚠️ Large dashboards incur extra latency when exporting.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Create or identify a dashboard whose `json_metadata` contains
`chart_customization_config` entries with multiple display controls or
targets sharing the
same `datasetId`, then export it via `GET /api/v1/dashboard/export/`
(implemented in
`superset/dashboards/api.py:12-25`), which invokes
`ExportDashboardsCommand(requested_ids).run()`.
2. `ExportDashboardsCommand.run()` delegates to
`ExportDashboardsCommand._file_content(model)` for each dashboard model via
the
`_file_content` static method at
`superset/commands/dashboard/export.py:119-196`, which
first materializes the dashboard payload from `model.export_to_dict(...)`
and decodes JSON
fields into the `payload` dictionary.
3. In `_file_content`, after handling native filter dataset UUID rewrites,
the
display-control rewrite block at
`superset/commands/dashboard/export.py:149-168` iterates
over `payload.get("metadata", {}).get("chart_customization_config")` and then
`customization.get("targets") or []`, retrieving `dataset_id =
target.get("datasetId")`
and calling `DatasetDAO.find_by_id(dataset_id)` for every target that has a
`datasetId`.
4. Because this loop performs `DatasetDAO.find_by_id(dataset_id)`
unconditionally per
target without any in-memory caching of results or bulk prefetch, dashboards
with many
display controls pointing to the same dataset will issue repeated identical
DAO lookups
(ultimately database queries via the SQLAlchemy session in `DatasetDAO`,
defined in
`superset/daos/dataset.py:52-113`). Each export of such a dashboard
therefore exhibits an
N+1-style pattern where the number of dataset lookups scales with the number
of
display-control targets rather than the number of distinct datasets.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fcommands%2Fdashboard%2Fexport.py%0A%2A%2ALine%3A%2A%2A%20153%3A168%0A%2A%2AComment%3A%2A%2A%0A%09%2APerformance%3A%20Dataset%20UUID%20resolution%20for%20display%20controls%20performs%20a%20database%20lookup%20per%20target%20%28%60DatasetDAO.find_by_id%60%29%20with%20no%20caching.%20Dashboards%20with%20many%20controls%20targeting%20the%20same%20dataset%20will%20trigger%20redundant%20repeated%20queries%20and%20degrade%20export%20latency.%20Cache%20lookups%20by%20dataset%20ID%20%28or%20prefetch%20in%20bulk%29%20before%20rewriting%20targets.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20
the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fcommands%2Fdashboard%2Fexport.py%0A%2A%2ALine%3A%2A%2A%20153%3A168%0A%2A%2AComment%3A%2A%2A%0A%09%2APerformance%3A%20Dataset%20UUID%20resolution%20for%20display%20controls%20performs%20a%20database%20lookup%20per%20target%20%28%60DatasetDAO.find_by_id%60%29%20with%20no%20caching.%20Dashboards%20with%20many%20controls%20targeting%20the%20same%20dataset%20will%20trigger%20redundant%20repeated%20queries%20and%20degrade%20export%20latency.%20Cache%20lookups%20by%20dataset%20ID%20%28or%20prefetch%20in%20bulk%29%20before%20rewriting%20targets.%0A%0AValidate%
20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/dashboard/export.py
**Line:** 153:168
**Comment:**
*Performance: Dataset UUID resolution for display controls performs a
database lookup per target (`DatasetDAO.find_by_id`) with no caching.
Dashboards with many controls targeting the same dataset will trigger redundant
repeated queries and degrade export latency. Cache lookups by dataset ID (or
prefetch in bulk) before rewriting targets.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40008&comment_hash=23b9ba8deae43103d8133105af21ad4adcec47dc08118f6cc5845160670a7fcd&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40008&comment_hash=23b9ba8deae43103d8133105af21ad4adcec47dc08118f6cc5845160670a7fcd&reaction=dislike'>👎</a>
##########
superset/commands/dashboard/export.py:
##########
@@ -230,3 +251,14 @@ def _export(
dataset = DatasetDAO.find_by_id(dataset_id)
if dataset:
yield from
ExportDatasetsCommand([dataset_id]).run()
+
+ # Export datasets referenced by display controls
+ for customization in (
+ payload.get("metadata", {}).get("chart_customization_config")
or []
+ ):
+ for target in customization.get("targets") or []:
+ dataset_id = target.get("datasetId")
+ if dataset_id is not None:
+ dataset = DatasetDAO.find_by_id(dataset_id)
+ if dataset:
Review Comment:
**Suggestion:** The display-control dataset export loop runs
`ExportDatasetsCommand([dataset_id]).run()` once per target without
deduplicating dataset IDs. If multiple controls/targets reference the same
dataset, this repeatedly rebuilds dataset export payloads and re-runs command
logic for identical files, causing avoidable O(n) command overhead during
dashboard export. Collect unique dataset IDs first and export each dataset
once. [performance]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Dashboard export re-runs dataset export for identical targets.
- ⚠️ Assets export runs redundant dataset exports for dashboards.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Trigger a dashboard export via the API endpoint `GET
/api/v1/dashboard/export/`
implemented in `superset/dashboards/api.py:12-25`, passing a dashboard ID
whose JSON
metadata contains multiple display controls in `chart_customization_config`
all
referencing the same dataset ID in their `targets[*].datasetId`.
2. The API constructs `ExportDashboardsCommand(requested_ids)` and calls
`.run()`, which
invokes `ExportModelsCommand.run()` in
`superset/commands/export/models.py:58-73`; this,
after validation, iterates over each dashboard model and delegates to
`ExportDashboardsCommand._export(model, export_related=True)` in
`superset/commands/dashboard/export.py:200-225`.
3. Inside `_export`, after chart and theme export, the method rebuilds a
`payload` and,
when `export_related` is true, executes the native-filter dataset export
loop followed by
the display-control dataset export loop at
`superset/commands/dashboard/export.py:243-255`. In the display-control loop
(lines
255-263 in the PR hunk), it iterates every customization and every `target`
in
`chart_customization_config`, calling
`ExportDatasetsCommand([dataset_id]).run()` once per
target when `datasetId` is set and `DatasetDAO.find_by_id(dataset_id)`
returns a dataset.
4. Because there is no deduplication of `dataset_id` within this
display-control loop, if
two or more targets reference the same dataset ID, `_export` will
instantiate and run
`ExportDatasetsCommand([dataset_id])` multiple times.
`ExportModelsCommand.run()`
deduplicates only on the resulting `file_name` (using the `seen` set at
`export/models.py:68-73`), so duplicate dataset YAMLs are skipped from the
final bundle
but each `ExportDatasetsCommand` still performs its work (building payloads,
related
database export, etc.), leading to avoidable O(n_targets) repeated work for
the same
dataset on every dashboard export that has repeated dataset IDs in display
controls.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fcommands%2Fdashboard%2Fexport.py%0A%2A%2ALine%3A%2A%2A%20255%3A263%0A%2A%2AComment%3A%2A%2A%0A%09%2APerformance%3A%20The%20display-control%20dataset%20export%20loop%20runs%20%60ExportDatasetsCommand%28%5Bdataset_id%5D%29.run%28%29%60%20once%20per%20target%20without%20deduplicating%20dataset%20IDs.%20If%20multiple%20controls%2Ftargets%20reference%20the%20same%20dataset%2C%20this%20repeatedly%20rebuilds%20dataset%20export%20payloads%20and%20re-runs%20command%20logic%20for%20identical%20files%2C%20causing%20avoidable%20O%28n%29%20command%20overhead%20during%20dashboard%20export.%20Collect%20unique%20dataset%20IDs%20first%20and%20export%20each%20dataset%20once.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20impleme
nt%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fcommands%2Fdashboard%2Fexport.py%0A%2A%2ALine%3A%2A%2A%20255%3A263%0A%2A%2AComment%3A%2A%2A%0A%09%2APerformance%3A%20The%20display-control%20dataset%20export%20loop%20runs%20%60ExportDatasetsCommand%28%5Bdataset_id%5D%29.run%28%29%60%20once%20per%20target%20without%20deduplicating%20dataset%20IDs.%20If%20multiple%20controls%2Ftargets%20reference%20the%20same%20dataset%2C%20this%20repeatedly%20rebuilds%20dataset%20export%20payloads%2
0and%20re-runs%20command%20logic%20for%20identical%20files%2C%20causing%20avoidable%20O%28n%29%20command%20overhead%20during%20dashboard%20export.%20Collect%20unique%20dataset%20IDs%20first%20and%20export%20each%20dataset%20once.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/dashboard/export.py
**Line:** 255:263
**Comment:**
*Performance: The display-control dataset export loop runs
`ExportDatasetsCommand([dataset_id]).run()` once per target without
deduplicating dataset IDs. If multiple controls/targets reference the same
dataset, this repeatedly rebuilds dataset export payloads and re-runs command
logic for identical files, causing avoidable O(n) command overhead during
dashboard export. Collect unique dataset IDs first and export each dataset once.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40008&comment_hash=bff615e75296b0fc08ea6ff122a148c2d0662a0efd39e1064c6aa9ca043c9d1f&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40008&comment_hash=bff615e75296b0fc08ea6ff122a148c2d0662a0efd39e1064c6aa9ca043c9d1f&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]