codeant-ai-for-open-source[bot] commented on code in PR #39502:
URL: https://github.com/apache/superset/pull/39502#discussion_r3197131017


##########
superset/commands/importers/v1/assets.py:
##########
@@ -206,7 +211,48 @@ def _import(  # noqa: C901
     )
     def run(self) -> None:
         self.validate()
-        self._import(self._configs, self.sparse, self.contents)
+        self._import(self._configs, self.sparse, self.contents, self.overwrite)
+
+    # Maps asset file prefixes to the model class used to look up UUIDs for
+    # the "already exists" validation check when ``overwrite`` is ``False``.
+    _MODEL_BY_PREFIX: dict[str, Any] = {
+        "databases/": Database,
+        "datasets/": SqlaTable,
+        "charts/": Slice,
+        "dashboards/": Dashboard,
+        "queries/": SavedQuery,
+    }
+
+    def _prevent_overwrite_existing_assets(
+        self, exceptions: list[ValidationError]
+    ) -> None:
+        """
+        When ``overwrite`` is ``False``, raise a clear validation error for any
+        asset in the bundle whose UUID already exists in the database.
+        """
+        if self.overwrite:
+            return
+
+        for prefix, model_cls in self._MODEL_BY_PREFIX.items():
+            existing_uuids = {
+                str(uuid) for (uuid,) in db.session.query(model_cls.uuid).all()
+            }

Review Comment:
   **Suggestion:** This validation does full-table UUID scans for every asset 
type (`Database`, `SqlaTable`, `Slice`, `Dashboard`, `SavedQuery`) on every 
import, even if the bundle contains only a few files. On large instances this 
can cause major latency and memory pressure. Build a per-prefix set of incoming 
UUIDs and query only matching rows with `IN (...)` instead of loading all UUIDs 
from each table. [possible bug]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ /api/v1/assets/import overwrite=false always scans all asset tables.
   - ⚠️ Asset import latency grows linearly with total stored assets.
   - ⚠️ Additional queries add memory pressure on metadata database.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Call the bulk import API by POSTing to `/api/v1/assets/import/` 
(implemented in
   `superset/importexport/api.py:95-201`) with a valid ZIP bundle and form field
   `overwrite=false`, so that `ImportExportRestApi.import_` constructs
   `ImportAssetsCommand(..., overwrite=False)` (`importexport/api.py:195-237`) 
and calls
   `command.run()`.
   
   2. Inside `ImportAssetsCommand.run` 
(`superset/commands/importers/v1/assets.py:205-215`),
   `self.validate()` is invoked. `validate()` loads the bundle configs via 
`load_configs`
   (`assets.py:59-21`) into `self._configs` and then calls
   `_prevent_overwrite_existing_assets(exceptions)` (`assets.py:22`).
   
   3. `_prevent_overwrite_existing_assets` (`assets.py:17-35`) first checks `if
   self.overwrite: return` and, since `overwrite=False`, iterates over 
`_MODEL_BY_PREFIX`
   (`assets.py:7-15`), which maps `"databases/"` → `Database`, `"datasets/"` → 
`SqlaTable`,
   `"charts/"` → `Slice`, `"dashboards/"` → `Dashboard`, and `"queries/"` → 
`SavedQuery`.
   
   4. For each prefix/model pair, it executes 
`db.session.query(model_cls.uuid).all()` and
   builds `existing_uuids = {str(uuid) for (uuid,) in ...}` 
(`assets.py:27-30`), pulling
   every UUID from each of the five tables into Python sets, regardless of how 
many files of
   that type are actually present in `self._configs`. These full-table UUID 
scans run on
   every import with `overwrite=false`, giving O(total stored assets) database 
and memory
   work per import, rather than O(assets in the bundle).
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fcommands%2Fimporters%2Fv1%2Fassets.py%0A%2A%2ALine%3A%2A%2A%20237%3A239%0A%2A%2AComment%3A%2A%2A%0A%09%2APossible%20Bug%3A%20This%20validation%20does%20full-table%20UUID%20scans%20for%20every%20asset%20type%20%28%60Database%60%2C%20%60SqlaTable%60%2C%20%60Slice%60%2C%20%60Dashboard%60%2C%20%60SavedQuery%60%29%20on%20every%20import%2C%20even%20if%20the%20bundle%20contains%20only%20a%20few%20files.%20On%20large%20instances%20this%20can%20cause%20major%20latency%20and%20memory%20pressure.%20Build%20a%20per-prefix%20set%20of%20incoming%20UUIDs%20and%20query%20only%20matching%20rows%20with%20%60IN%20%28...%29%60%20instead%20of%20loading%20all%20UUIDs%20from%20each%20table.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2
 
C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fcommands%2Fimporters%2Fv1%2Fassets.py%0A%2A%2ALine%3A%2A%2A%20237%3A239%0A%2A%2AComment%3A%2A%2A%0A%09%2APossible%20Bug%3A%20This%20validation%20does%20full-table%20UUID%20scans%20for%20every%20asset%20type%20%28%60Database%60%2C%20%60SqlaTable%60%2C%20%60Slice%60%2C%20%60Dashboard%60%2C%20%60SavedQuery%60%29%20on%20every%20import%2C%20even%20if%20the%20bundle%20contains%20only%20a%20few%20files.%20On%20large%20instances%
 
20this%20can%20cause%20major%20latency%20and%20memory%20pressure.%20Build%20a%20per-prefix%20set%20of%20incoming%20UUIDs%20and%20query%20only%20matching%20rows%20with%20%60IN%20%28...%29%60%20instead%20of%20loading%20all%20UUIDs%20from%20each%20table.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/importers/v1/assets.py
   **Line:** 237:239
   **Comment:**
        *Possible Bug: This validation does full-table UUID scans for every 
asset type (`Database`, `SqlaTable`, `Slice`, `Dashboard`, `SavedQuery`) on 
every import, even if the bundle contains only a few files. On large instances 
this can cause major latency and memory pressure. Build a per-prefix set of 
incoming UUIDs and query only matching rows with `IN (...)` instead of loading 
all UUIDs from each table.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39502&comment_hash=aa86e8d0ce433fd0e68e0c183b16338f31667fb38f777a0038d7ac22336c9a4d&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39502&comment_hash=aa86e8d0ce433fd0e68e0c183b16338f31667fb38f777a0038d7ac22336c9a4d&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to