codeant-ai-for-open-source[bot] commented on code in PR #40008:
URL: https://github.com/apache/superset/pull/40008#discussion_r3216453764


##########
superset/commands/dashboard/export.py:
##########
@@ -146,6 +146,27 @@ def _file_content(model: Dashboard) -> str:
                     if dataset:
                         target["datasetUuid"] = str(dataset.uuid)
 
+        # Replace display control dataset references with uuid.
+        # datasetId is intentionally preserved alongside datasetUuid so that
+        # bundles remain importable by older versions that do not yet 
understand
+        # datasetUuid for display-control targets.
+        for customization in (
+            payload.get("metadata", {}).get("chart_customization_config") or []
+        ):
+            for target in customization.get("targets") or []:
+                dataset_id = target.get("datasetId")
+                if dataset_id is not None:
+                    dataset = DatasetDAO.find_by_id(dataset_id)
+                    if dataset:
+                        target["datasetUuid"] = str(dataset.uuid)
+                    else:
+                        logger.warning(
+                            "Dashboard '%s': display control target references 
"
+                            "missing dataset %s; datasetUuid will not be set",
+                            model.dashboard_title,
+                            dataset_id,
+                        )

Review Comment:
   **Suggestion:** Dataset UUID resolution for display controls performs a 
database lookup per target (`DatasetDAO.find_by_id`) with no caching. 
Dashboards with many controls targeting the same dataset will trigger redundant 
repeated queries and degrade export latency. Cache lookups by dataset ID (or 
prefetch in bulk) before rewriting targets. [performance]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Dashboard export issues redundant dataset lookups per target.
   - ⚠️ Large dashboards incur extra latency when exporting.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Create or identify a dashboard whose `json_metadata` contains
   `chart_customization_config` entries with multiple display controls or 
targets sharing the
   same `datasetId`, then export it via `GET /api/v1/dashboard/export/` 
(implemented in
   `superset/dashboards/api.py:12-25`), which invokes
   `ExportDashboardsCommand(requested_ids).run()`.
   
   2. `ExportDashboardsCommand.run()` delegates to
   `ExportDashboardsCommand._file_content(model)` for each dashboard model via 
the
   `_file_content` static method at 
`superset/commands/dashboard/export.py:119-196`, which
   first materializes the dashboard payload from `model.export_to_dict(...)` 
and decodes JSON
   fields into the `payload` dictionary.
   
   3. In `_file_content`, after handling native filter dataset UUID rewrites, 
the
   display-control rewrite block at 
`superset/commands/dashboard/export.py:149-168` iterates
   over `payload.get("metadata", {}).get("chart_customization_config")` and then
   `customization.get("targets") or []`, retrieving `dataset_id = 
target.get("datasetId")`
   and calling `DatasetDAO.find_by_id(dataset_id)` for every target that has a 
`datasetId`.
   
   4. Because this loop performs `DatasetDAO.find_by_id(dataset_id)` 
unconditionally per
   target without any in-memory caching of results or bulk prefetch, dashboards 
with many
   display controls pointing to the same dataset will issue repeated identical 
DAO lookups
   (ultimately database queries via the SQLAlchemy session in `DatasetDAO`, 
defined in
   `superset/daos/dataset.py:52-113`). Each export of such a dashboard 
therefore exhibits an
   N+1-style pattern where the number of dataset lookups scales with the number 
of
   display-control targets rather than the number of distinct datasets.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fcommands%2Fdashboard%2Fexport.py%0A%2A%2ALine%3A%2A%2A%20153%3A168%0A%2A%2AComment%3A%2A%2A%0A%09%2APerformance%3A%20Dataset%20UUID%20resolution%20for%20display%20controls%20performs%20a%20database%20lookup%20per%20target%20%28%60DatasetDAO.find_by_id%60%29%20with%20no%20caching.%20Dashboards%20with%20many%20controls%20targeting%20the%20same%20dataset%20will%20trigger%20redundant%20repeated%20queries%20and%20degrade%20export%20latency.%20Cache%20lookups%20by%20dataset%20ID%20%28or%20prefetch%20in%20bulk%29%20before%20rewriting%20targets.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20
 
the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fcommands%2Fdashboard%2Fexport.py%0A%2A%2ALine%3A%2A%2A%20153%3A168%0A%2A%2AComment%3A%2A%2A%0A%09%2APerformance%3A%20Dataset%20UUID%20resolution%20for%20display%20controls%20performs%20a%20database%20lookup%20per%20target%20%28%60DatasetDAO.find_by_id%60%29%20with%20no%20caching.%20Dashboards%20with%20many%20controls%20targeting%20the%20same%20dataset%20will%20trigger%20redundant%20repeated%20queries%20and%20degrade%20export%20latency.%20Cache%20lookups%20by%20dataset%20ID%20%28or%20prefetch%20in%20bulk%29%20before%20rewriting%20targets.%0A%0AValidate%
 
20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/dashboard/export.py
   **Line:** 153:168
   **Comment:**
        *Performance: Dataset UUID resolution for display controls performs a 
database lookup per target (`DatasetDAO.find_by_id`) with no caching. 
Dashboards with many controls targeting the same dataset will trigger redundant 
repeated queries and degrade export latency. Cache lookups by dataset ID (or 
prefetch in bulk) before rewriting targets.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40008&comment_hash=23b9ba8deae43103d8133105af21ad4adcec47dc08118f6cc5845160670a7fcd&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40008&comment_hash=23b9ba8deae43103d8133105af21ad4adcec47dc08118f6cc5845160670a7fcd&reaction=dislike'>👎</a>



##########
superset/commands/dashboard/export.py:
##########
@@ -230,3 +251,14 @@ def _export(
                         dataset = DatasetDAO.find_by_id(dataset_id)
                         if dataset:
                             yield from 
ExportDatasetsCommand([dataset_id]).run()
+
+            # Export datasets referenced by display controls
+            for customization in (
+                payload.get("metadata", {}).get("chart_customization_config") 
or []
+            ):
+                for target in customization.get("targets") or []:
+                    dataset_id = target.get("datasetId")
+                    if dataset_id is not None:
+                        dataset = DatasetDAO.find_by_id(dataset_id)
+                        if dataset:

Review Comment:
   **Suggestion:** The display-control dataset export loop runs 
`ExportDatasetsCommand([dataset_id]).run()` once per target without 
deduplicating dataset IDs. If multiple controls/targets reference the same 
dataset, this repeatedly rebuilds dataset export payloads and re-runs command 
logic for identical files, causing avoidable O(n) command overhead during 
dashboard export. Collect unique dataset IDs first and export each dataset 
once. [performance]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Dashboard export re-runs dataset export for identical targets.
   - ⚠️ Assets export runs redundant dataset exports for dashboards.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Trigger a dashboard export via the API endpoint `GET 
/api/v1/dashboard/export/`
   implemented in `superset/dashboards/api.py:12-25`, passing a dashboard ID 
whose JSON
   metadata contains multiple display controls in `chart_customization_config` 
all
   referencing the same dataset ID in their `targets[*].datasetId`.
   
   2. The API constructs `ExportDashboardsCommand(requested_ids)` and calls 
`.run()`, which
   invokes `ExportModelsCommand.run()` in 
`superset/commands/export/models.py:58-73`; this,
   after validation, iterates over each dashboard model and delegates to
   `ExportDashboardsCommand._export(model, export_related=True)` in
   `superset/commands/dashboard/export.py:200-225`.
   
   3. Inside `_export`, after chart and theme export, the method rebuilds a 
`payload` and,
   when `export_related` is true, executes the native-filter dataset export 
loop followed by
   the display-control dataset export loop at
   `superset/commands/dashboard/export.py:243-255`. In the display-control loop 
(lines
   255-263 in the PR hunk), it iterates every customization and every `target` 
in
   `chart_customization_config`, calling 
`ExportDatasetsCommand([dataset_id]).run()` once per
   target when `datasetId` is set and `DatasetDAO.find_by_id(dataset_id)` 
returns a dataset.
   
   4. Because there is no deduplication of `dataset_id` within this 
display-control loop, if
   two or more targets reference the same dataset ID, `_export` will 
instantiate and run
   `ExportDatasetsCommand([dataset_id])` multiple times. 
`ExportModelsCommand.run()`
   deduplicates only on the resulting `file_name` (using the `seen` set at
   `export/models.py:68-73`), so duplicate dataset YAMLs are skipped from the 
final bundle
   but each `ExportDatasetsCommand` still performs its work (building payloads, 
related
   database export, etc.), leading to avoidable O(n_targets) repeated work for 
the same
   dataset on every dashboard export that has repeated dataset IDs in display 
controls.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fcommands%2Fdashboard%2Fexport.py%0A%2A%2ALine%3A%2A%2A%20255%3A263%0A%2A%2AComment%3A%2A%2A%0A%09%2APerformance%3A%20The%20display-control%20dataset%20export%20loop%20runs%20%60ExportDatasetsCommand%28%5Bdataset_id%5D%29.run%28%29%60%20once%20per%20target%20without%20deduplicating%20dataset%20IDs.%20If%20multiple%20controls%2Ftargets%20reference%20the%20same%20dataset%2C%20this%20repeatedly%20rebuilds%20dataset%20export%20payloads%20and%20re-runs%20command%20logic%20for%20identical%20files%2C%20causing%20avoidable%20O%28n%29%20command%20overhead%20during%20dashboard%20export.%20Collect%20unique%20dataset%20IDs%20first%20and%20export%20each%20dataset%20once.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20impleme
 
nt%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fcommands%2Fdashboard%2Fexport.py%0A%2A%2ALine%3A%2A%2A%20255%3A263%0A%2A%2AComment%3A%2A%2A%0A%09%2APerformance%3A%20The%20display-control%20dataset%20export%20loop%20runs%20%60ExportDatasetsCommand%28%5Bdataset_id%5D%29.run%28%29%60%20once%20per%20target%20without%20deduplicating%20dataset%20IDs.%20If%20multiple%20controls%2Ftargets%20reference%20the%20same%20dataset%2C%20this%20repeatedly%20rebuilds%20dataset%20export%20payloads%2
 
0and%20re-runs%20command%20logic%20for%20identical%20files%2C%20causing%20avoidable%20O%28n%29%20command%20overhead%20during%20dashboard%20export.%20Collect%20unique%20dataset%20IDs%20first%20and%20export%20each%20dataset%20once.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/dashboard/export.py
   **Line:** 255:263
   **Comment:**
        *Performance: The display-control dataset export loop runs 
`ExportDatasetsCommand([dataset_id]).run()` once per target without 
deduplicating dataset IDs. If multiple controls/targets reference the same 
dataset, this repeatedly rebuilds dataset export payloads and re-runs command 
logic for identical files, causing avoidable O(n) command overhead during 
dashboard export. Collect unique dataset IDs first and export each dataset once.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40008&comment_hash=bff615e75296b0fc08ea6ff122a148c2d0662a0efd39e1064c6aa9ca043c9d1f&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40008&comment_hash=bff615e75296b0fc08ea6ff122a148c2d0662a0efd39e1064c6aa9ca043c9d1f&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to