Re: [PR] feat(datasets): soft-delete and restore [superset]

via GitHub Thu, 04 Jun 2026 08:29:38 -0700


codeant-ai-for-open-source[bot] commented on code in PR #40130:
URL: https://github.com/apache/superset/pull/40130#discussion_r3357115338



##########
tests/integration_tests/dashboard_utils.py:
##########
@@ -35,10 +36,23 @@ def get_table(
     schema: Optional[str] = None,
 ):
     schema = schema or get_example_default_schema()
+    # Bypass the soft-delete listener so the helper finds rows previously
+    # soft-deleted by other tests in the same session. Without the
+    # bypass, the listener hides them and a subsequent INSERT collides
+    # with the underlying ``(database_id, catalog, schema, table_name)``
+    # unique constraint that survives soft-delete. Prefer live rows over
+    # soft-deleted ones via ``ORDER BY deleted_at IS NULL DESC, id`` —
+    # the filter doesn't include ``catalog``, so it can match more than
+    # one row when a soft-deleted row coexists with a live row that
+    # differs only by catalog (e.g. NULL vs ``"public"``). ``.first()``
+    # plus the live-first ordering keeps the behaviour predictable
+    # without surfacing a ``MultipleResultsFound`` for that case.
     return (
         db.session.query(SqlaTable)
+        .execution_options(**{SKIP_VISIBILITY_FILTER_CLASSES: {SqlaTable}})
         .filter_by(database_id=database.id, schema=schema, 
table_name=table_name)
-        .one_or_none()
+        .order_by(SqlaTable.deleted_at.is_(None).desc(), SqlaTable.id)
+        .first()

Review Comment:
   **Suggestion:** This lookup now silently resolves ambiguous matches by 
taking the first row after ordering, which can return a dataset from a 
different catalog when multiple rows share the same 
`database_id/schema/table_name`. That causes test setup to mutate/reuse the 
wrong dataset instead of failing fast. Add `catalog` to the lookup criteria (or 
raise when multiple live matches exist) so setup remains correct and 
deterministic. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Dashboard fixtures can bind to wrong catalog-specific dataset.
   - ⚠️ Integration tests may mutate unexpected SqlaTable records.
   - ⚠️ Soft-delete restores may target unintended catalog variant.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Observe in `tests/unit_tests/migrations/shared/catalogs_test.py:13-20` 
and `:80-88`
   that the test suite explicitly creates two `SqlaTable` rows with the same
   `database`/`schema`/`table_name` (`"my_table"` in `"public"`) but different 
`catalog`
   values (`None` and `"other_catalog"`), proving the ORM and schema allow such 
duplicates
   distinguished only by `catalog`.
   
   2. In an integration test using the dashboard fixtures (for example,
   `tests/integration_tests/fixtures/world_bank_dashboard.py:29-35` or
   `energy_dashboard.py:13-18`), call `create_table_metadata(table_name, 
database)` from
   `tests/integration_tests/dashboard_utils.py:59-69` for a 
`table_name`/`schema`/`database`
   combination where two live `SqlaTable` rows exist that differ only by 
`catalog`, analogous
   to the setup in `catalogs_test.py`.
   
   3. `create_table_metadata()` calls `get_table()` at
   `tests/integration_tests/dashboard_utils.py:68`, which executes a query 
built at lines
   50-55: it filters only on `database_id`, `schema`, and `table_name` (line 
53) and then
   orders by `SqlaTable.deleted_at.is_(None).desc(), SqlaTable.id` and calls 
`.first()`
   (lines 54-55), without including `catalog` in the filter.
   
   4. With two live rows matching the filter but with different `catalog` 
values, both have
   `deleted_at IS NULL`, so the ordering reduces to `id` and `get_table()` 
deterministically
   returns whichever row was inserted first, regardless of `catalog`.
   `create_table_metadata()` then restores/mutates this potentially wrong 
dataset (lines
   69-82), so the fixture can attach metrics and descriptions to a dataset in 
the wrong
   catalog instead of failing fast or selecting the catalog-specific one.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=45a4615ac8de41959e860aee44000158&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=45a4615ac8de41959e860aee44000158&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/integration_tests/dashboard_utils.py
   **Line:** 52:55
   **Comment:**
        *Logic Error: This lookup now silently resolves ambiguous matches by 
taking the first row after ordering, which can return a dataset from a 
different catalog when multiple rows share the same 
`database_id/schema/table_name`. That causes test setup to mutate/reuse the 
wrong dataset instead of failing fast. Add `catalog` to the lookup criteria (or 
raise when multiple live matches exist) so setup remains correct and 
deterministic.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40130&comment_hash=181065d026ceef465a50cc05984ded44702dfa4790659c4253ec5bcb66db99cb&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40130&comment_hash=181065d026ceef465a50cc05984ded44702dfa4790659c4253ec5bcb66db99cb&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(datasets): soft-delete and restore [superset]

Reply via email to