codeant-ai-for-open-source[bot] commented on code in PR #37120:
URL: https://github.com/apache/superset/pull/37120#discussion_r2689232871
##########
superset/commands/dashboard/export.py:
##########
@@ -189,7 +193,8 @@ def _export(
dashboard_ids = model.id
command = ExportChartsCommand(chart_ids)
command.disable_tag_export()
- yield from command.run()
+ # Pass the shared seen set to the chart export command
+ yield from command.run(seen=seen)
command.enable_tag_export()
if feature_flag_manager.is_feature_enabled("TAGGING_SYSTEM"):
yield from ExportTagsCommand.export(
Review Comment:
**Suggestion:** The tags export is invoked via `yield from
ExportTagsCommand.export(...)` but this call does not participate in the shared
`seen` deduplication; that can cause duplicate tag file exports across nested
exports. Check/mark the tags file in the shared `seen` set and yield the tags
file only if it hasn't been seen yet. [possible bug]
**Severity Level:** Critical 🚨
##########
superset/commands/dashboard/export.py:
##########
@@ -189,7 +193,8 @@ def _export(
dashboard_ids = model.id
command = ExportChartsCommand(chart_ids)
command.disable_tag_export()
- yield from command.run()
+ # Pass the shared seen set to the chart export command
+ yield from command.run(seen=seen)
command.enable_tag_export()
Review Comment:
**Suggestion:** The code disables tag export by mutating a class-level flag
but does not preserve or restore the original value on error; if
`command.run(...)` raises, `enable_tag_export()` may never be called leaving
the class in a disabled state (affecting subsequent exports). Use a try/finally
and restore the original class-level value to ensure the flag is always
restored. [logic error]
**Severity Level:** Minor ⚠️
```suggestion
# Preserve the original class-level flag and restore it even if
the run raises
original_include_tags = getattr(ExportChartsCommand,
"_include_tags", True)
ExportChartsCommand._include_tags = False
try:
# Pass the shared seen set to the chart export command
yield from command.run(seen=seen)
finally:
ExportChartsCommand._include_tags = original_include_tags
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
The suggestion correctly identifies a real issue: disable_tag_export mutates
a class-level flag and if command.run raises, enable_tag_export won't be called
leaving the class in an inconsistent state for subsequent exports. Wrapping the
run in a try/finally (or otherwise preserving/restoring the original value) is
the correct defensive fix. The improved code restores the original value even
on exceptions.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/dashboard/export.py
**Line:** 195:198
**Comment:**
*Logic Error: The code disables tag export by mutating a class-level
flag but does not preserve or restore the original value on error; if
`command.run(...)` raises, `enable_tag_export()` may never be called leaving
the class in a disabled state (affecting subsequent exports). Use a try/finally
and restore the original class-level value to ensure the flag is always
restored.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/commands/export/models.py:
##########
@@ -51,23 +51,34 @@ def _file_content(model: Model) -> str:
Review Comment:
**Suggestion:** Incorrect error message: the `_file_content` stub raises
NotImplementedError telling implementers to "implement _export" even though the
method is `_file_content`; this is misleading and may cause confusion when
debugging—raise a message that references the correct method name. [logic error]
**Severity Level:** Minor ⚠️
```suggestion
raise NotImplementedError("Subclasses MUST implement _file_content")
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
The message is clearly a copy/paste typo and is misleading when
`_file_content` is unimplemented. Fixing it to reference `_file_content`
improves debuggability without changing behavior.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/export/models.py
**Line:** 51:51
**Comment:**
*Logic Error: Incorrect error message: the `_file_content` stub raises
NotImplementedError telling implementers to "implement _export" even though the
method is `_file_content`; this is misleading and may cause confusion when
debugging—raise a message that references the correct method name.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/commands/theme/export.py:
##########
@@ -67,8 +67,12 @@ def _file_content(model: Theme) -> str:
@staticmethod
def _export(
- model: Theme, export_related: bool = True
+ model: Theme, export_related: bool = True, seen: set[str] | None = None
) -> Iterator[tuple[str, Callable[[], str]]]:
+ # Initialize seen set if not provided (for consistency)
+ if seen is None:
+ seen = set()
+
yield (
ExportThemesCommand._file_name(model),
lambda: ExportThemesCommand._file_content(model),
Review Comment:
**Suggestion:** The lambda passed as the file content factory closes over
`model` which can lead to late-binding issues if the callable is evaluated
later; bind `model` into the lambda's default arguments to capture its value at
definition time. [possible bug]
**Severity Level:** Critical 🚨
```suggestion
lambda _m=model: ExportThemesCommand._file_content(_m),
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
Defensive and harmless improvement: capturing `model` as a default argument
avoids potential late-binding
surprises if the callable is evaluated later or if the function evolves to
yield multiple items in a loop.
While in the current implementation it's unlikely to cause a bug (single
yield), the change is cheap and safe.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/theme/export.py
**Line:** 78:78
**Comment:**
*Possible Bug: The lambda passed as the file content factory closes
over `model` which can lead to late-binding issues if the callable is evaluated
later; bind `model` into the lambda's default arguments to capture its value at
definition time.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/commands/dashboard/export.py:
##########
@@ -177,8 +177,12 @@ def _file_content(model: Dashboard) -> str:
@staticmethod
# ruff: noqa: C901
def _export(
- model: Dashboard, export_related: bool = True
+ model: Dashboard, export_related: bool = True, seen: set[str] | None =
None
) -> Iterator[tuple[str, Callable[[], str]]]:
+ # Initialize seen set if not provided
+ if seen is None:
+ seen = set()
Review Comment:
**Suggestion:** The function accepts a `seen` parameter but only checks `if
seen is None: seen = set()`; if a caller passes a mutable iterable (e.g. list)
the code later expects set operations like `add` and will raise an
AttributeError. Coerce non-set iterables to a set at start to ensure `seen`
supports set semantics. [type error]
**Severity Level:** Minor ⚠️
```suggestion
# Initialize/coerce seen set if not provided or if an iterable was
passed
if seen is None:
seen = set()
elif not isinstance(seen, set):
# Coerce other iterables (lists, tuples) into a set so callers
can pass any iterable
seen = set(seen)
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
Coercing non-set iterables to a set is sensible defensive programming: the
type annotation isn't enforced at runtime and callers might pass a list/tuple.
If downstream code expects set semantics (e.g., .add, membership checks),
converting here avoids surprising AttributeError. The change is simple and safe.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/dashboard/export.py
**Line:** 182:184
**Comment:**
*Type Error: The function accepts a `seen` parameter but only checks
`if seen is None: seen = set()`; if a caller passes a mutable iterable (e.g.
list) the code later expects set operations like `add` and will raise an
AttributeError. Coerce non-set iterables to a set at start to ensure `seen`
supports set semantics.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/commands/database/export.py:
##########
@@ -104,8 +104,12 @@ def _file_content(model: Database) -> str:
@staticmethod
def _export(
- model: Database, export_related: bool = True
+ model: Database, export_related: bool = True, seen: set[str] | None =
None
) -> Iterator[tuple[str, Callable[[], str]]]:
+ # Initialize seen set if not provided
+ if seen is None:
+ seen = set()
+
Review Comment:
**Suggestion:** The new code initializes a `seen` set but never adds the
database file name to it; as a result, the shared deduplication context won't
record that this database has been exported and nested/parallel exporters won't
be able to avoid re-exporting the same database file. Add the database file
path into `seen` immediately after initialization so downstream code and nested
exporters can observe the exported database. [logic error]
**Severity Level:** Minor ⚠️
```suggestion
# Record this database's export filename in the shared deduplication
set
seen.add(ExportDatabasesCommand._file_name(model))
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
This suggestion points out a real omission: this exporter yields the
database file but never records that filename in the shared `seen` set. Other
exporters that rely on `seen` to avoid re-exporting the same database will not
see that this DB has already been exported. Adding the file name (using the
same key this function yields) to `seen` fixes the deduplication gap and
directly addresses the bug described in the PR.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/database/export.py
**Line:** 112:112
**Comment:**
*Logic Error: The new code initializes a `seen` set but never adds the
database file name to it; as a result, the shared deduplication context won't
record that this database has been exported and nested/parallel exporters won't
be able to avoid re-exporting the same database file. Add the database file
path into `seen` immediately after initialization so downstream code and nested
exporters can observe the exported database.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/commands/dataset/export.py:
##########
@@ -103,29 +107,33 @@ def _export(
)
file_path = f"databases/{db_file_name}.yaml"
- payload = model.database.export_to_dict(
- recursive=False,
- include_parent_ref=False,
- include_defaults=True,
- export_uuids=True,
- )
- # TODO (betodealmeida): move this logic to export_to_dict once this
- # becomes the default export endpoint
- if payload.get("extra"):
- try:
- payload["extra"] = json.loads(payload["extra"])
- except json.JSONDecodeError:
- logger.info("Unable to decode `extra` field: %s",
payload["extra"])
-
- if ssh_tunnel := model.database.ssh_tunnel:
- ssh_tunnel_payload = ssh_tunnel.export_to_dict(
+ # Only yield the database file if not already seen
+ # This is critical to fix the issue where databases were being
duplicated
+ # and potentially overwritten when charts from different databases
were exported
+ if file_path not in seen:
+ payload = model.database.export_to_dict(
recursive=False,
include_parent_ref=False,
include_defaults=True,
- export_uuids=False,
+ export_uuids=True,
)
- payload["ssh_tunnel"] = mask_password_info(ssh_tunnel_payload)
+ # TODO (betodealmeida): move this logic to export_to_dict once
this
+ # becomes the default export endpoint
+ if payload.get("extra"):
+ try:
+ payload["extra"] = json.loads(payload["extra"])
+ except json.JSONDecodeError:
+ logger.info("Unable to decode `extra` field: %s",
payload["extra"])
+
+ if ssh_tunnel := model.database.ssh_tunnel:
+ ssh_tunnel_payload = ssh_tunnel.export_to_dict(
+ recursive=False,
+ include_parent_ref=False,
+ include_defaults=True,
+ export_uuids=False,
+ )
+ payload["ssh_tunnel"] =
mask_password_info(ssh_tunnel_payload)
- payload["version"] = EXPORT_VERSION
+ payload["version"] = EXPORT_VERSION
- yield file_path, lambda: yaml.safe_dump(payload, sort_keys=False)
+ yield file_path, lambda: yaml.safe_dump(payload,
sort_keys=False)
Review Comment:
**Suggestion:** The lambda used for the database file's content closes over
the mutable local `payload` variable; if multiple payloads are created in a
loop, all yielded lambdas will reference the final `payload` value at call
time—bind `payload` as a default argument in the lambda to capture its current
value. [possible bug]
**Severity Level:** Critical 🚨
```suggestion
# Bind payload to the lambda default to avoid late-binding
closure capturing
yield file_path, (lambda payload=payload:
yaml.safe_dump(payload, sort_keys=False))
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
The yielded lambda closes over the local `payload` variable. If multiple
payloads are produced
in the same frame (e.g., via looped logic in the future), the lambdas will
exhibit late-binding
behavior and may serialize the wrong payload when invoked later. Binding
`payload` as a default
argument in the lambda is a cheap, defensive fix that avoids this class of
subtle bugs.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/dataset/export.py
**Line:** 139:139
**Comment:**
*Possible Bug: The lambda used for the database file's content closes
over the mutable local `payload` variable; if multiple payloads are created in
a loop, all yielded lambdas will reference the final `payload` value at call
time—bind `payload` as a default argument in the lambda to capture its current
value.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/commands/query/export.py:
##########
@@ -67,8 +67,12 @@ def _file_content(model: SavedQuery) -> str:
@staticmethod
def _export(
- model: SavedQuery, export_related: bool = True
+ model: SavedQuery, export_related: bool = True, seen: set[str] | None
= None
) -> Iterator[tuple[str, Callable[[], str]]]:
+ # Initialize seen set if not provided
+ if seen is None:
+ seen = set()
+
yield (
ExportSavedQueriesCommand._file_name(model),
Review Comment:
**Suggestion:** The initial query file yielded by the command is never
recorded in the shared `seen` set, so other parts of the export flow can still
re-yield the same file and cause duplicates; add a check and record the query
filename in `seen`. [logic error]
**Severity Level:** Minor ⚠️
```suggestion
query_file = ExportSavedQueriesCommand._file_name(model)
if query_file not in seen:
seen.add(query_file)
yield (
query_file,
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
The command currently yields the query file unconditionally but the shared
`seen`
set is intended to deduplicate exports across nested commands. Not recording
the
yielded query filename allows duplicate query files to be produced by other
nested exports. The proposed change prevents duplicates and aligns with the
PR's
shared-deduplication approach. It's a targeted, correct fix.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/query/export.py
**Line:** 76:77
**Comment:**
*Logic Error: The initial query file yielded by the command is never
recorded in the shared `seen` set, so other parts of the export flow can still
re-yield the same file and cause duplicates; add a check and record the query
filename in `seen`.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/commands/query/export.py:
##########
@@ -79,21 +83,23 @@ def _export(
database_slug = secure_filename(model.database.database_name)
file_name = f"databases/{database_slug}.yaml"
- payload = model.database.export_to_dict(
- recursive=False,
- include_parent_ref=False,
- include_defaults=True,
- export_uuids=True,
- )
- # TODO (betodealmeida): move this logic to export_to_dict once this
- # becomes the default export endpoint
- if "extra" in payload:
- try:
- payload["extra"] = json.loads(payload["extra"])
- except json.JSONDecodeError:
- logger.info("Unable to decode `extra` field: %s",
payload["extra"])
+ # Only yield if not already seen (similar to dataset export)
+ if file_name not in seen:
+ payload = model.database.export_to_dict(
+ recursive=False,
+ include_parent_ref=False,
+ include_defaults=True,
+ export_uuids=True,
+ )
+ # TODO (betodealmeida): move this logic to export_to_dict once
this
+ # becomes the default export endpoint
+ if "extra" in payload:
+ try:
+ payload["extra"] = json.loads(payload["extra"])
+ except json.JSONDecodeError:
Review Comment:
**Suggestion:** The code attempts to json.loads the `payload["extra"]` field
without ensuring it's a string/bytes; if `extra` is already a dict (or another
non-string), json.loads will raise a TypeError which is not caught—only
JSONDecodeError is caught—so validate the type before calling `json.loads` or
catch TypeError as well. [type error]
**Severity Level:** Minor ⚠️
```suggestion
if "extra" in payload and isinstance(payload["extra"], (str,
bytes)):
try:
payload["extra"] = json.loads(payload["extra"])
except (json.JSONDecodeError, TypeError):
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
If payload["extra"] is already a dict or another non-string, calling
json.loads on it raises a TypeError
which is not caught by the current except block (only JSONDecodeError is
handled). Guarding with an
isinstance check or including TypeError in the except tuple prevents an
unexpected exception and is
a safe, defensive improvement.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/query/export.py
**Line:** 96:99
**Comment:**
*Type Error: The code attempts to json.loads the `payload["extra"]`
field without ensuring it's a string/bytes; if `extra` is already a dict (or
another non-string), json.loads will raise a TypeError which is not caught—only
JSONDecodeError is caught—so validate the type before calling `json.loads` or
catch TypeError as well.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]