codeant-ai-for-open-source[bot] commented on code in PR #37120:
URL: https://github.com/apache/superset/pull/37120#discussion_r2689232871


##########
superset/commands/dashboard/export.py:
##########
@@ -189,7 +193,8 @@ def _export(
             dashboard_ids = model.id
             command = ExportChartsCommand(chart_ids)
             command.disable_tag_export()
-            yield from command.run()
+            # Pass the shared seen set to the chart export command
+            yield from command.run(seen=seen)
             command.enable_tag_export()
             if feature_flag_manager.is_feature_enabled("TAGGING_SYSTEM"):
                 yield from ExportTagsCommand.export(

Review Comment:
   **Suggestion:** The tags export is invoked via `yield from 
ExportTagsCommand.export(...)` but this call does not participate in the shared 
`seen` deduplication; that can cause duplicate tag file exports across nested 
exports. Check/mark the tags file in the shared `seen` set and yield the tags 
file only if it hasn't been seen yet. [possible bug]
   
   **Severity Level:** Critical 🚨
   



##########
superset/commands/dashboard/export.py:
##########
@@ -189,7 +193,8 @@ def _export(
             dashboard_ids = model.id
             command = ExportChartsCommand(chart_ids)
             command.disable_tag_export()
-            yield from command.run()
+            # Pass the shared seen set to the chart export command
+            yield from command.run(seen=seen)
             command.enable_tag_export()

Review Comment:
   **Suggestion:** The code disables tag export by mutating a class-level flag 
but does not preserve or restore the original value on error; if 
`command.run(...)` raises, `enable_tag_export()` may never be called leaving 
the class in a disabled state (affecting subsequent exports). Use a try/finally 
and restore the original class-level value to ensure the flag is always 
restored. [logic error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
               # Preserve the original class-level flag and restore it even if 
the run raises
               original_include_tags = getattr(ExportChartsCommand, 
"_include_tags", True)
               ExportChartsCommand._include_tags = False
               try:
                   # Pass the shared seen set to the chart export command
                   yield from command.run(seen=seen)
               finally:
                   ExportChartsCommand._include_tags = original_include_tags
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The suggestion correctly identifies a real issue: disable_tag_export mutates 
a class-level flag and if command.run raises, enable_tag_export won't be called 
leaving the class in an inconsistent state for subsequent exports. Wrapping the 
run in a try/finally (or otherwise preserving/restoring the original value) is 
the correct defensive fix. The improved code restores the original value even 
on exceptions.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/dashboard/export.py
   **Line:** 195:198
   **Comment:**
        *Logic Error: The code disables tag export by mutating a class-level 
flag but does not preserve or restore the original value on error; if 
`command.run(...)` raises, `enable_tag_export()` may never be called leaving 
the class in a disabled state (affecting subsequent exports). Use a try/finally 
and restore the original class-level value to ensure the flag is always 
restored.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/commands/export/models.py:
##########
@@ -51,23 +51,34 @@ def _file_content(model: Model) -> str:
 

Review Comment:
   **Suggestion:** Incorrect error message: the `_file_content` stub raises 
NotImplementedError telling implementers to "implement _export" even though the 
method is `_file_content`; this is misleading and may cause confusion when 
debugging—raise a message that references the correct method name. [logic error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
           raise NotImplementedError("Subclasses MUST implement _file_content")
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The message is clearly a copy/paste typo and is misleading when 
`_file_content` is unimplemented. Fixing it to reference `_file_content` 
improves debuggability without changing behavior.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/export/models.py
   **Line:** 51:51
   **Comment:**
        *Logic Error: Incorrect error message: the `_file_content` stub raises 
NotImplementedError telling implementers to "implement _export" even though the 
method is `_file_content`; this is misleading and may cause confusion when 
debugging—raise a message that references the correct method name.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/commands/theme/export.py:
##########
@@ -67,8 +67,12 @@ def _file_content(model: Theme) -> str:
 
     @staticmethod
     def _export(
-        model: Theme, export_related: bool = True
+        model: Theme, export_related: bool = True, seen: set[str] | None = None
     ) -> Iterator[tuple[str, Callable[[], str]]]:
+        # Initialize seen set if not provided (for consistency)
+        if seen is None:
+            seen = set()
+
         yield (
             ExportThemesCommand._file_name(model),
             lambda: ExportThemesCommand._file_content(model),

Review Comment:
   **Suggestion:** The lambda passed as the file content factory closes over 
`model` which can lead to late-binding issues if the callable is evaluated 
later; bind `model` into the lambda's default arguments to capture its value at 
definition time. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
               lambda _m=model: ExportThemesCommand._file_content(_m),
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Defensive and harmless improvement: capturing `model` as a default argument 
avoids potential late-binding
   surprises if the callable is evaluated later or if the function evolves to 
yield multiple items in a loop.
   While in the current implementation it's unlikely to cause a bug (single 
yield), the change is cheap and safe.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/theme/export.py
   **Line:** 78:78
   **Comment:**
        *Possible Bug: The lambda passed as the file content factory closes 
over `model` which can lead to late-binding issues if the callable is evaluated 
later; bind `model` into the lambda's default arguments to capture its value at 
definition time.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/commands/dashboard/export.py:
##########
@@ -177,8 +177,12 @@ def _file_content(model: Dashboard) -> str:
     @staticmethod
     # ruff: noqa: C901
     def _export(
-        model: Dashboard, export_related: bool = True
+        model: Dashboard, export_related: bool = True, seen: set[str] | None = 
None
     ) -> Iterator[tuple[str, Callable[[], str]]]:
+        # Initialize seen set if not provided
+        if seen is None:
+            seen = set()

Review Comment:
   **Suggestion:** The function accepts a `seen` parameter but only checks `if 
seen is None: seen = set()`; if a caller passes a mutable iterable (e.g. list) 
the code later expects set operations like `add` and will raise an 
AttributeError. Coerce non-set iterables to a set at start to ensure `seen` 
supports set semantics. [type error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
           # Initialize/coerce seen set if not provided or if an iterable was 
passed
           if seen is None:
               seen = set()
           elif not isinstance(seen, set):
               # Coerce other iterables (lists, tuples) into a set so callers 
can pass any iterable
               seen = set(seen)
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Coercing non-set iterables to a set is sensible defensive programming: the 
type annotation isn't enforced at runtime and callers might pass a list/tuple. 
If downstream code expects set semantics (e.g., .add, membership checks), 
converting here avoids surprising AttributeError. The change is simple and safe.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/dashboard/export.py
   **Line:** 182:184
   **Comment:**
        *Type Error: The function accepts a `seen` parameter but only checks 
`if seen is None: seen = set()`; if a caller passes a mutable iterable (e.g. 
list) the code later expects set operations like `add` and will raise an 
AttributeError. Coerce non-set iterables to a set at start to ensure `seen` 
supports set semantics.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/commands/database/export.py:
##########
@@ -104,8 +104,12 @@ def _file_content(model: Database) -> str:
 
     @staticmethod
     def _export(
-        model: Database, export_related: bool = True
+        model: Database, export_related: bool = True, seen: set[str] | None = 
None
     ) -> Iterator[tuple[str, Callable[[], str]]]:
+        # Initialize seen set if not provided
+        if seen is None:
+            seen = set()
+

Review Comment:
   **Suggestion:** The new code initializes a `seen` set but never adds the 
database file name to it; as a result, the shared deduplication context won't 
record that this database has been exported and nested/parallel exporters won't 
be able to avoid re-exporting the same database file. Add the database file 
path into `seen` immediately after initialization so downstream code and nested 
exporters can observe the exported database. [logic error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
           # Record this database's export filename in the shared deduplication 
set
           seen.add(ExportDatabasesCommand._file_name(model))
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   This suggestion points out a real omission: this exporter yields the 
database file but never records that filename in the shared `seen` set. Other 
exporters that rely on `seen` to avoid re-exporting the same database will not 
see that this DB has already been exported. Adding the file name (using the 
same key this function yields) to `seen` fixes the deduplication gap and 
directly addresses the bug described in the PR.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/database/export.py
   **Line:** 112:112
   **Comment:**
        *Logic Error: The new code initializes a `seen` set but never adds the 
database file name to it; as a result, the shared deduplication context won't 
record that this database has been exported and nested/parallel exporters won't 
be able to avoid re-exporting the same database file. Add the database file 
path into `seen` immediately after initialization so downstream code and nested 
exporters can observe the exported database.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/commands/dataset/export.py:
##########
@@ -103,29 +107,33 @@ def _export(
             )
             file_path = f"databases/{db_file_name}.yaml"
 
-            payload = model.database.export_to_dict(
-                recursive=False,
-                include_parent_ref=False,
-                include_defaults=True,
-                export_uuids=True,
-            )
-            # TODO (betodealmeida): move this logic to export_to_dict once this
-            # becomes the default export endpoint
-            if payload.get("extra"):
-                try:
-                    payload["extra"] = json.loads(payload["extra"])
-                except json.JSONDecodeError:
-                    logger.info("Unable to decode `extra` field: %s", 
payload["extra"])
-
-            if ssh_tunnel := model.database.ssh_tunnel:
-                ssh_tunnel_payload = ssh_tunnel.export_to_dict(
+            # Only yield the database file if not already seen
+            # This is critical to fix the issue where databases were being 
duplicated
+            # and potentially overwritten when charts from different databases 
were exported
+            if file_path not in seen:
+                payload = model.database.export_to_dict(
                     recursive=False,
                     include_parent_ref=False,
                     include_defaults=True,
-                    export_uuids=False,
+                    export_uuids=True,
                 )
-                payload["ssh_tunnel"] = mask_password_info(ssh_tunnel_payload)
+                # TODO (betodealmeida): move this logic to export_to_dict once 
this
+                # becomes the default export endpoint
+                if payload.get("extra"):
+                    try:
+                        payload["extra"] = json.loads(payload["extra"])
+                    except json.JSONDecodeError:
+                        logger.info("Unable to decode `extra` field: %s", 
payload["extra"])
+
+                if ssh_tunnel := model.database.ssh_tunnel:
+                    ssh_tunnel_payload = ssh_tunnel.export_to_dict(
+                        recursive=False,
+                        include_parent_ref=False,
+                        include_defaults=True,
+                        export_uuids=False,
+                    )
+                    payload["ssh_tunnel"] = 
mask_password_info(ssh_tunnel_payload)
 
-            payload["version"] = EXPORT_VERSION
+                payload["version"] = EXPORT_VERSION
 
-            yield file_path, lambda: yaml.safe_dump(payload, sort_keys=False)
+                yield file_path, lambda: yaml.safe_dump(payload, 
sort_keys=False)

Review Comment:
   **Suggestion:** The lambda used for the database file's content closes over 
the mutable local `payload` variable; if multiple payloads are created in a 
loop, all yielded lambdas will reference the final `payload` value at call 
time—bind `payload` as a default argument in the lambda to capture its current 
value. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
                   # Bind payload to the lambda default to avoid late-binding 
closure capturing
                   yield file_path, (lambda payload=payload: 
yaml.safe_dump(payload, sort_keys=False))
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The yielded lambda closes over the local `payload` variable. If multiple 
payloads are produced
   in the same frame (e.g., via looped logic in the future), the lambdas will 
exhibit late-binding
   behavior and may serialize the wrong payload when invoked later. Binding 
`payload` as a default
   argument in the lambda is a cheap, defensive fix that avoids this class of 
subtle bugs.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/dataset/export.py
   **Line:** 139:139
   **Comment:**
        *Possible Bug: The lambda used for the database file's content closes 
over the mutable local `payload` variable; if multiple payloads are created in 
a loop, all yielded lambdas will reference the final `payload` value at call 
time—bind `payload` as a default argument in the lambda to capture its current 
value.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/commands/query/export.py:
##########
@@ -67,8 +67,12 @@ def _file_content(model: SavedQuery) -> str:
 
     @staticmethod
     def _export(
-        model: SavedQuery, export_related: bool = True
+        model: SavedQuery, export_related: bool = True, seen: set[str] | None 
= None
     ) -> Iterator[tuple[str, Callable[[], str]]]:
+        # Initialize seen set if not provided
+        if seen is None:
+            seen = set()
+
         yield (
             ExportSavedQueriesCommand._file_name(model),

Review Comment:
   **Suggestion:** The initial query file yielded by the command is never 
recorded in the shared `seen` set, so other parts of the export flow can still 
re-yield the same file and cause duplicates; add a check and record the query 
filename in `seen`. [logic error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
           query_file = ExportSavedQueriesCommand._file_name(model)
           if query_file not in seen:
               seen.add(query_file)
               yield (
                   query_file,
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The command currently yields the query file unconditionally but the shared 
`seen`
   set is intended to deduplicate exports across nested commands. Not recording 
the
   yielded query filename allows duplicate query files to be produced by other
   nested exports. The proposed change prevents duplicates and aligns with the 
PR's
   shared-deduplication approach. It's a targeted, correct fix.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/query/export.py
   **Line:** 76:77
   **Comment:**
        *Logic Error: The initial query file yielded by the command is never 
recorded in the shared `seen` set, so other parts of the export flow can still 
re-yield the same file and cause duplicates; add a check and record the query 
filename in `seen`.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/commands/query/export.py:
##########
@@ -79,21 +83,23 @@ def _export(
             database_slug = secure_filename(model.database.database_name)
             file_name = f"databases/{database_slug}.yaml"
 
-            payload = model.database.export_to_dict(
-                recursive=False,
-                include_parent_ref=False,
-                include_defaults=True,
-                export_uuids=True,
-            )
-            # TODO (betodealmeida): move this logic to export_to_dict once this
-            # becomes the default export endpoint
-            if "extra" in payload:
-                try:
-                    payload["extra"] = json.loads(payload["extra"])
-                except json.JSONDecodeError:
-                    logger.info("Unable to decode `extra` field: %s", 
payload["extra"])
+            # Only yield if not already seen (similar to dataset export)
+            if file_name not in seen:
+                payload = model.database.export_to_dict(
+                    recursive=False,
+                    include_parent_ref=False,
+                    include_defaults=True,
+                    export_uuids=True,
+                )
+                # TODO (betodealmeida): move this logic to export_to_dict once 
this
+                # becomes the default export endpoint
+                if "extra" in payload:
+                    try:
+                        payload["extra"] = json.loads(payload["extra"])
+                    except json.JSONDecodeError:

Review Comment:
   **Suggestion:** The code attempts to json.loads the `payload["extra"]` field 
without ensuring it's a string/bytes; if `extra` is already a dict (or another 
non-string), json.loads will raise a TypeError which is not caught—only 
JSONDecodeError is caught—so validate the type before calling `json.loads` or 
catch TypeError as well. [type error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
                   if "extra" in payload and isinstance(payload["extra"], (str, 
bytes)):
                       try:
                           payload["extra"] = json.loads(payload["extra"])
                       except (json.JSONDecodeError, TypeError):
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   If payload["extra"] is already a dict or another non-string, calling 
json.loads on it raises a TypeError
   which is not caught by the current except block (only JSONDecodeError is 
handled). Guarding with an
   isinstance check or including TypeError in the except tuple prevents an 
unexpected exception and is
   a safe, defensive improvement.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/query/export.py
   **Line:** 96:99
   **Comment:**
        *Type Error: The code attempts to json.loads the `payload["extra"]` 
field without ensuring it's a string/bytes; if `extra` is already a dict (or 
another non-string), json.loads will raise a TypeError which is not caught—only 
JSONDecodeError is caught—so validate the type before calling `json.loads` or 
catch TypeError as well.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to