mikebridge commented on code in PR #40130:
URL: https://github.com/apache/superset/pull/40130#discussion_r3344696424


##########
superset/commands/dataset/importers/v1/utils.py:
##########
@@ -140,21 +141,93 @@ def import_dataset(  # noqa: C901
     force_data: bool = False,
     ignore_permissions: bool = False,
 ) -> SqlaTable:
+    """Import a dataset from a config dict, handling existing matches.
+
+    Permission model for an existing UUID match:
+
+    +--------------+---------------+---------------------+-----------------+
+    | Existing row | overwrite arg | Caller has perms?   | Outcome         |
+    +==============+===============+=====================+=================+
+    | alive        | False         | (n/a)               | return existing |
+    +--------------+---------------+---------------------+-----------------+
+    | alive        | True          | can_write + owner   | UPDATE in place |
+    +--------------+---------------+---------------------+-----------------+
+    | alive        | True          | can_write,          | raise           |
+    |              |               | not owner/admin     |                 |
+    +--------------+---------------+---------------------+-----------------+
+    | soft-deleted | False or True | can_write + owner   | restore + UPDATE|
+    +--------------+---------------+---------------------+-----------------+
+    | soft-deleted | False or True | can_write,          | raise           |
+    |              |               | not owner/admin     |                 |
+    +--------------+---------------+---------------------+-----------------+
+    | soft-deleted | False or True | not can_write       | raise (Case B)  |
+    +--------------+---------------+---------------------+-----------------+
+
+    Re-importing a soft-deleted UUID is implicitly a restore-with-update:
+    the user is bringing the dataset back by uploading it again. We apply
+    the same ownership check as the explicit overwrite path so non-owners
+    cannot resurrect via re-import, and we raise rather than silently
+    returning a soft-deleted row to callers without write permission
+    (which would let them reattach charts/dashboards to a deleted dataset).
+    """
     can_write = ignore_permissions or security_manager.can_access(
         "can_write",
         "Dataset",
     )
-    existing = 
db.session.query(SqlaTable).filter_by(uuid=config["uuid"]).first()
+    from superset.commands.importers.v1.utils import find_existing_for_import

Review Comment:
   Fixed in 1ee9bbdc1a — moved `find_existing_for_import` to module-top 
imports. Verified no cycle: `superset.commands.importers.v1.utils` does not 
reach `superset.commands.dataset.importers.v1.utils` via any import chain.



##########
superset/commands/dataset/restore.py:
##########
@@ -0,0 +1,43 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Command to restore a soft-deleted dataset."""
+
+import logging
+
+from superset.commands.dataset.exceptions import (
+    DatasetForbiddenError,
+    DatasetNotFoundError,
+    DatasetRestoreFailedError,
+)
+from superset.commands.restore import BaseRestoreCommand
+from superset.connectors.sqla.models import SqlaTable
+from superset.daos.dataset import DatasetDAO
+
+logger = logging.getLogger(__name__)
+
+
+class RestoreDatasetCommand(BaseRestoreCommand[SqlaTable]):
+    """Restore a soft-deleted dataset by clearing its ``deleted_at`` field.
+
+    All transactional wrapping and validation lives in
+    ``BaseRestoreCommand``; this subclass is purely declarative.
+    """
+
+    dao = DatasetDAO
+    not_found_exc = DatasetNotFoundError
+    forbidden_exc = DatasetForbiddenError
+    restore_failed_exc = DatasetRestoreFailedError

Review Comment:
   Headline functional fix in 0591115434 — `RestoreDatasetCommand.validate()` 
now overrides the base with an `_has_active_logical_duplicate(model)` check 
that raises a new `DatasetLogicalDuplicateError` (HTTP 422) when another active 
dataset references the same `(database_id, catalog, schema, table_name)`. 
Defense-in-depth: `DatasetDAO.validate_uniqueness` and 
`validate_update_uniqueness` also bypass the visibility filter so a 
soft-deleted twin blocks creates from the API side. Importer 
(`importers/v1/utils.py`) has the same check before clearing `deleted_at`. 
`build_dataset_query` Core-select gap also closed with explicit 
`ds_table.c.deleted_at.is_(None)`. Integration tests at 
`tests/integration_tests/datasets/soft_delete_tests.py:200-289` cover 
restore-blocked and create-blocked flows. Richard flagged the same shape 
independently — both findings driving the same fix.



##########
superset/commands/dataset/exceptions.py:
##########
@@ -170,6 +170,10 @@ class DatasetUpdateFailedError(UpdateFailedError):
     message = _("Dataset could not be updated.")
 
 
+class DatasetRestoreFailedError(CommandException):
+    message = _("Dataset could not be restored.")

Review Comment:
   Fixed in 0591115434 — `DatasetRestoreFailedError` now inherits 
`UpdateFailedError` (restore semantically clears `deleted_at`, so it's an 
UPDATE). Inline comment cross-references the cross-entity follow-up: a 
dedicated `RestoreFailedError` in already-merged 
`superset/commands/exceptions.py` would be more precise across the three entity 
rollouts but lives in #39977. Charts (`ChartRestoreFailedError`) and dashboards 
(`DashboardRestoreFailedError`) got the same normalization in companion commits 
48efb0b0de and ab41d0c84c.



##########
tests/unit_tests/datasets/commands/importers/v1/import_test.py:
##########
@@ -1014,6 +1015,331 @@ def test_import_dataset_access_check(
         import_dataset(config)
 
 
+def test_import_soft_deleted_dataset_overwrite_restores_in_place(
+    mocker: MockerFixture,
+    session: Session,
+) -> None:
+    """
+    Overwrite-importing a soft-deleted dataset must restore the row in
+    place rather than hard-delete-and-replace. A hard delete would
+    cascade through the chart back-reference and table_columns /
+    sql_metrics rows; in-place restore preserves them.
+    """
+    mocker.patch.object(security_manager, "can_access", return_value=True)
+
+    engine = db.session.get_bind()
+    SqlaTable.metadata.create_all(engine)  # pylint: disable=no-member
+
+    database = Database(database_name="my_database", 
sqlalchemy_uri="sqlite://")
+    db.session.add(database)
+    db.session.flush()
+
+    config = copy.deepcopy(dataset_fixture)
+    config["database_id"] = database.id
+
+    initial = import_dataset(config)
+    original_id = initial.id
+
+    existing = db.session.query(SqlaTable).filter_by(uuid=config["uuid"]).one()
+    existing.deleted_at = datetime(2026, 1, 1, 12, 0, 0)
+    db.session.flush()
+
+    admin = User(
+        first_name="Alice",
+        last_name="Doe",
+        email="[email protected]",
+        username="admin",
+        roles=[Role(name="Admin")],
+    )
+
+    with override_user(admin):
+        restored = import_dataset(config, overwrite=True)
+
+    assert restored.id == original_id
+    assert restored.deleted_at is None
+
+
+def test_import_soft_deleted_dataset_non_overwrite_restores_for_owner(
+    mocker: MockerFixture,
+    session: Session,
+) -> None:
+    """
+    Non-overwrite re-import of a soft-deleted UUID is implicitly a
+    restore-and-update: the user is bringing the dataset back by
+    uploading it again. The same ownership rule as the overwrite path
+    applies, so an owner (or admin) succeeds without setting
+    overwrite=True.
+    """
+    mocker.patch.object(security_manager, "can_access", return_value=True)
+
+    engine = db.session.get_bind()
+    SqlaTable.metadata.create_all(engine)  # pylint: disable=no-member
+
+    database = Database(database_name="my_database", 
sqlalchemy_uri="sqlite://")
+    db.session.add(database)
+    db.session.flush()
+
+    config = copy.deepcopy(dataset_fixture)
+    config["database_id"] = database.id
+
+    initial = import_dataset(config)
+    original_id = initial.id
+
+    existing = db.session.query(SqlaTable).filter_by(uuid=config["uuid"]).one()
+    existing.deleted_at = datetime(2026, 1, 1, 12, 0, 0)
+    db.session.flush()
+
+    admin = User(
+        first_name="Alice",
+        last_name="Doe",
+        email="[email protected]",
+        username="admin",
+        roles=[Role(name="Admin")],
+    )
+
+    with override_user(admin):
+        restored = import_dataset(config, overwrite=False)
+
+    assert restored.id == original_id
+    assert restored.deleted_at is None
+
+
+def test_import_soft_deleted_dataset_non_overwrite_raises_for_non_owner(
+    mocker: MockerFixture,
+    session: Session,
+) -> None:
+    """
+    Non-overwrite re-import that would resurrect a soft-deleted dataset
+    must respect ownership: a non-owner without admin role cannot
+    restore-via-import. Mirrors the explicit /restore endpoint's check.
+    """
+    mocker.patch.object(security_manager, "can_access", return_value=True)
+
+    engine = db.session.get_bind()
+    SqlaTable.metadata.create_all(engine)  # pylint: disable=no-member
+
+    database = Database(database_name="my_database", 
sqlalchemy_uri="sqlite://")
+    db.session.add(database)
+    db.session.flush()
+
+    config = copy.deepcopy(dataset_fixture)
+    config["database_id"] = database.id
+
+    import_dataset(config)
+    existing = db.session.query(SqlaTable).filter_by(uuid=config["uuid"]).one()
+    existing.deleted_at = datetime(2026, 1, 1, 12, 0, 0)
+    db.session.flush()
+
+    non_owner = User(
+        first_name="Bob",
+        last_name="Roe",
+        email="[email protected]",
+        username="bob",
+        roles=[Role(name="Gamma")],
+    )
+
+    with override_user(non_owner):
+        with pytest.raises(ImportFailedError) as excinfo:
+            import_dataset(config, overwrite=False)
+    assert "permissions to restore" in str(excinfo.value)
+

Review Comment:
   Fixed in 18399ce910 — added post-exception assertions that 
`existing.deleted_at` is still set after `pytest.raises(ImportFailedError)`. A 
regression that cleared `deleted_at` before raising would silently pass the 
message check alone; this pins the no-mutation-before-raise contract.



##########
tests/unit_tests/datasets/commands/importers/v1/import_test.py:
##########
@@ -1014,6 +1015,331 @@ def test_import_dataset_access_check(
         import_dataset(config)
 
 
+def test_import_soft_deleted_dataset_overwrite_restores_in_place(
+    mocker: MockerFixture,
+    session: Session,
+) -> None:
+    """
+    Overwrite-importing a soft-deleted dataset must restore the row in
+    place rather than hard-delete-and-replace. A hard delete would
+    cascade through the chart back-reference and table_columns /
+    sql_metrics rows; in-place restore preserves them.
+    """
+    mocker.patch.object(security_manager, "can_access", return_value=True)
+
+    engine = db.session.get_bind()
+    SqlaTable.metadata.create_all(engine)  # pylint: disable=no-member
+
+    database = Database(database_name="my_database", 
sqlalchemy_uri="sqlite://")
+    db.session.add(database)
+    db.session.flush()
+
+    config = copy.deepcopy(dataset_fixture)
+    config["database_id"] = database.id
+
+    initial = import_dataset(config)
+    original_id = initial.id
+
+    existing = db.session.query(SqlaTable).filter_by(uuid=config["uuid"]).one()
+    existing.deleted_at = datetime(2026, 1, 1, 12, 0, 0)
+    db.session.flush()
+
+    admin = User(
+        first_name="Alice",
+        last_name="Doe",
+        email="[email protected]",
+        username="admin",
+        roles=[Role(name="Admin")],
+    )
+
+    with override_user(admin):
+        restored = import_dataset(config, overwrite=True)
+
+    assert restored.id == original_id
+    assert restored.deleted_at is None
+
+
+def test_import_soft_deleted_dataset_non_overwrite_restores_for_owner(
+    mocker: MockerFixture,
+    session: Session,
+) -> None:
+    """
+    Non-overwrite re-import of a soft-deleted UUID is implicitly a
+    restore-and-update: the user is bringing the dataset back by
+    uploading it again. The same ownership rule as the overwrite path
+    applies, so an owner (or admin) succeeds without setting
+    overwrite=True.
+    """
+    mocker.patch.object(security_manager, "can_access", return_value=True)
+
+    engine = db.session.get_bind()
+    SqlaTable.metadata.create_all(engine)  # pylint: disable=no-member
+
+    database = Database(database_name="my_database", 
sqlalchemy_uri="sqlite://")
+    db.session.add(database)
+    db.session.flush()
+
+    config = copy.deepcopy(dataset_fixture)
+    config["database_id"] = database.id
+
+    initial = import_dataset(config)
+    original_id = initial.id
+
+    existing = db.session.query(SqlaTable).filter_by(uuid=config["uuid"]).one()
+    existing.deleted_at = datetime(2026, 1, 1, 12, 0, 0)
+    db.session.flush()
+
+    admin = User(
+        first_name="Alice",
+        last_name="Doe",
+        email="[email protected]",
+        username="admin",
+        roles=[Role(name="Admin")],
+    )
+
+    with override_user(admin):
+        restored = import_dataset(config, overwrite=False)
+
+    assert restored.id == original_id
+    assert restored.deleted_at is None
+
+
+def test_import_soft_deleted_dataset_non_overwrite_raises_for_non_owner(
+    mocker: MockerFixture,
+    session: Session,
+) -> None:
+    """
+    Non-overwrite re-import that would resurrect a soft-deleted dataset
+    must respect ownership: a non-owner without admin role cannot
+    restore-via-import. Mirrors the explicit /restore endpoint's check.
+    """
+    mocker.patch.object(security_manager, "can_access", return_value=True)
+
+    engine = db.session.get_bind()
+    SqlaTable.metadata.create_all(engine)  # pylint: disable=no-member
+
+    database = Database(database_name="my_database", 
sqlalchemy_uri="sqlite://")
+    db.session.add(database)
+    db.session.flush()
+
+    config = copy.deepcopy(dataset_fixture)
+    config["database_id"] = database.id
+
+    import_dataset(config)
+    existing = db.session.query(SqlaTable).filter_by(uuid=config["uuid"]).one()
+    existing.deleted_at = datetime(2026, 1, 1, 12, 0, 0)
+    db.session.flush()
+
+    non_owner = User(
+        first_name="Bob",
+        last_name="Roe",
+        email="[email protected]",
+        username="bob",
+        roles=[Role(name="Gamma")],
+    )
+
+    with override_user(non_owner):
+        with pytest.raises(ImportFailedError) as excinfo:
+            import_dataset(config, overwrite=False)
+    assert "permissions to restore" in str(excinfo.value)
+
+
+def test_import_soft_deleted_dataset_raises_when_caller_lacks_can_write(
+    mocker: MockerFixture,
+    session: Session,
+) -> None:
+    """
+    Case B: re-import of a soft-deleted UUID by a caller without
+    can_write must raise, not silently return the soft-deleted row.
+
+    Real-world scenario: a user has can_write Dashboard but not
+    can_write Dataset, and they import a dashboard zip that references
+    a soft-deleted dataset. Silently returning the row would let the
+    dashboard importer wire the dashboard's charts to a deleted dataset
+    and produce broken chart loads.
+    """
+    mocker.patch.object(security_manager, "can_access", return_value=False)
+
+    engine = db.session.get_bind()
+    SqlaTable.metadata.create_all(engine)  # pylint: disable=no-member
+
+    database = Database(database_name="my_database", 
sqlalchemy_uri="sqlite://")
+    db.session.add(database)
+    db.session.flush()
+
+    config = copy.deepcopy(dataset_fixture)
+    config["database_id"] = database.id
+
+    # Seed a soft-deleted dataset with the matching UUID directly, so the
+    # test doesn't need to flip permissions mid-test.
+    existing = SqlaTable(
+        table_name="soft_deleted_dataset",
+        database_id=database.id,
+        uuid=config["uuid"],
+        deleted_at=datetime(2026, 1, 1, 12, 0, 0),
+    )
+    db.session.add(existing)
+    db.session.flush()
+
+    with pytest.raises(ImportFailedError) as excinfo:
+        import_dataset(config, overwrite=False)
+    assert "can_write" in str(excinfo.value)
+

Review Comment:
   Same fix as above in 18399ce910 — Case B test now pins that 
`existing.deleted_at` is unchanged when the can_write check fires. Uses 
`db.session.refresh(existing)` to defeat any in-session caching that might mask 
a regression.



##########
tests/unit_tests/commands/dataset/restore_test.py:
##########
@@ -0,0 +1,95 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Unit tests for RestoreDatasetCommand."""
+
+from __future__ import annotations
+
+from datetime import datetime
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+
+def test_restore_dataset_clears_deleted_at(app_context: None) -> None:
+    """RestoreDatasetCommand.run() restores a soft-deleted dataset."""
+    from superset.commands.dataset.restore import RestoreDatasetCommand
+
+    dataset = MagicMock()
+    dataset.deleted_at = datetime(2026, 1, 1)
+    dataset.id = 1
+
+    with (
+        patch(
+            "superset.daos.dataset.DatasetDAO.find_by_id", return_value=dataset
+        ) as mock_find,
+        patch("superset.commands.restore.security_manager") as mock_sec,
+    ):
+        mock_sec.raise_for_ownership.return_value = None
+
+        cmd = RestoreDatasetCommand("1")
+        cmd.run()

Review Comment:
   Fixed in 18399ce910 — `restore_test.py` rewritten with realistic UUIDs: 
`_DATASET_UUID = str(uuid.uuid4())` at module top, used at every 
`RestoreDatasetCommand(...)` call site. Documents the expected argument type 
even though the DAO is mocked.



##########
tests/integration_tests/datasets/api_tests.py:
##########
@@ -152,8 +151,11 @@ def insert_database(self, name: str, allow_multi_catalog: 
bool = False) -> Datab
         return db_connection
 
     def get_fixture_datasets(self) -> list[SqlaTable]:
+        from superset.constants import SKIP_VISIBILITY_FILTER_CLASSES

Review Comment:
   Fixed in 18399ce910 — `SKIP_VISIBILITY_FILTER_CLASSES` moved to the 
module-level import block. All three inline import sites in `api_tests.py` 
(`get_fixture_datasets`, `create_datasets` teardown, 
`test_delete_dataset_item`) now use the top-level import.



##########
superset/datasets/api.py:
##########
@@ -706,8 +717,9 @@ def refresh(self, pk: int) -> Response:
     @safe
     @statsd_metrics
     @event_logger.log_this_with_context(

Review Comment:
   Reverted in 18399ce910 — all four cosmetic lambda reformats backed out 
(`detect_datetime_formats`, `related_objects`, `get_or_create_dataset`, 
`get_drill_info`). Only the genuinely-new `restore` lambda remains in the diff 
vs master. Diff hygiene point well taken.



##########
superset/migrations/versions/2026-05-08_12-10_3a8e6f2c1b95_add_deleted_at_to_tables.py:
##########
@@ -0,0 +1,52 @@
+# Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   Declining — `2026-05-08_12-10_3a8e6f2c1b95_add_deleted_at_to_tables.py` 
follows Alembic's date-prefixed file naming convention; the hyphens in the date 
prefix are standard and won't cause import issues (alembic loads migrations by 
revision id, not module name).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to