codeant-ai-for-open-source[bot] commented on code in PR #37932:
URL: https://github.com/apache/superset/pull/37932#discussion_r2799467494


##########
superset/commands/dataset/duplicate.py:
##########
@@ -66,6 +66,7 @@ def run(self) -> Model:
         table = SqlaTable(table_name=table_name, owners=owners)
         table.database = database
         table.schema = self._base_model.schema
+        table.catalog = self._base_model.catalog

Review Comment:
   **Suggestion:** When duplicating a virtual dataset whose `catalog` is not 
explicitly set, validation uses the database's default catalog to check 
uniqueness, but the duplicated dataset itself is saved with a `None` catalog, 
which is inconsistent with the uniqueness logic and the CreateDatasetCommand 
behavior and can lead to later duplicate-name checks misbehaving for 
catalog-aware databases. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Dataset duplication may overlook existing duplicates in catalogs.
   - ⚠️ Catalog-aware DBs get inconsistent uniqueness vs stored metadata.
   - ⚠️ Dataset management UI may allow logically duplicate datasets.
   ```
   </details>
   
   ```suggestion
           table.catalog = self._base_model.catalog or 
database.get_default_catalog()
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure a catalog-aware database (e.g. Trino) in Superset with a 
non-null default
   catalog (via Database model at `superset/models/core.py`, then referenced in
   `Database.get_default_catalog()` used in 
`superset/commands/dataset/duplicate.py:121`).
   
   2. Create a virtual dataset on this database so that its `SqlaTable` record 
has `catalog`
   unset/NULL (this dataset later becomes `self._base_model` inside 
`DuplicateDatasetCommand`
   in `superset/commands/dataset/duplicate.py`).
   
   3. From the Datasets list UI, click "Duplicate" for that virtual dataset and 
submit a new
   name; this triggers `DuplicateDatasetCommand.run()` at
   `superset/commands/dataset/duplicate.py:51`, which calls `validate()` at 
line 105.
   
   4. In `validate()`, uniqueness is checked using `catalog = 
self._base_model.catalog or
   database.get_default_catalog()` at line 121 (i.e. the database default 
catalog), but in
   `run()` the new `SqlaTable` is persisted with `table.catalog = 
self._base_model.catalog`
   at line 69 (NULL), so the stored catalog for the duplicated dataset does not 
match the
   catalog used for uniqueness checks. This mismatch can cause later 
duplicate-name
   validation (using the resolved catalog) to overlook this dataset for 
catalog-aware
   databases, allowing logically duplicate datasets pointing at the same 
catalog/schema/table
   to be created.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/dataset/duplicate.py
   **Line:** 69:69
   **Comment:**
        *Logic Error: When duplicating a virtual dataset whose `catalog` is not 
explicitly set, validation uses the database's default catalog to check 
uniqueness, but the duplicated dataset itself is saved with a `None` catalog, 
which is inconsistent with the uniqueness logic and the CreateDatasetCommand 
behavior and can lead to later duplicate-name checks misbehaving for 
catalog-aware databases.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F37932&comment_hash=133252f12f563e5a75f2df5fb845e805439b8efa2b7123896b28cb861f58c64f&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F37932&comment_hash=133252f12f563e5a75f2df5fb845e805439b8efa2b7123896b28cb861f58c64f&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to