mikebridge commented on PR #39977:
URL: https://github.com/apache/superset/pull/39977#issuecomment-4501477281
@aminghadersohi thanks for the careful pass. Most items addressed in
`06a64f02c1`; two deferred with reasoning below.
### HIGH
**H1 — uncached subclass walk → cached registry via `__init_subclass__`.
Fixed.**
You're right that retrofitting under live traffic is harder than getting it
right now. `SoftDeleteMixin` now maintains a sorted `_registered_subclasses:
ClassVar[list[type[SoftDeleteMixin]]]` updated by `__init_subclass__` at
class-definition time; `_all_soft_delete_subclasses()` is now a `return
SoftDeleteMixin._registered_subclasses` — no walk, no `sorted()` allocation per
call. Sort happens once per registration (rare event at app init), not per
query.
I chose the `__init_subclass__` registry over `functools.lru_cache` because
the cache would freeze at first call — tests that define synthetic subclasses
(we have ~6 of them) would never be picked up by the listener, breaking the
test suite. The registry hook handles class-definition events naturally and
works for both production and test contexts.
**H2 — non-restorable / GDPR hook. Deferred.**
The concern is valid but I'd push back on landing the interface now. Two
reasons:
1. **No concrete use case yet**: SIP-208 V1 doesn't include compliance
flows. Designing a `deletion_policy` or `locked_at` interface without a real
GDPR/audit/admin-purge workflow risks getting the API wrong — and once entity
migrations land, the migration to a richer interface becomes the expensive
change you're trying to avoid.
2. **The natural retrofit isn't catastrophic**: if compliance becomes V2
scope, the cheap path is a separate `LockedDeletion` table (UUID-keyed, points
at the soft-deleted row) checked in `BaseRestoreCommand.validate()`. That's a
one-migration add — no column on the mixin, no change to the existing entity
migrations. The expense Amin is anticipating ("five entity rollouts already
landed") doesn't actually accrue, because the lock lives outside the entity
table.
If GDPR work gets scheduled, I'll do the design then with the actual
constraints in hand. Tracking as a follow-up rather than a speculative
interface here.
### MEDIUM
**M3 — `SoftDeleteMixin` inline import in `BaseDAO.delete`. Fixed.** Moved
to top-level (no cycle); `SKIP_VISIBILITY_FILTER_CLASSES` from the same module
was already top-level so adding `SoftDeleteMixin` to the existing import line
was the right cleanup.
**M4 — `raise_for_ownership` pylint-disable. Documented (kept inline).**
Tried moving to top-level — it does cause a real circular: `security.manager` →
`models.helpers` → `models.core` → lazy `superset.feature_flag_manager`. Kept
the inline import but added a comment naming the specific cycle so a future
maintainer doesn't try to "fix" it and re-trip the issue:
```python
# Inline import: ``superset.models.helpers`` transitively imports
# ``superset.models.core``, which depends on lazily-initialised
# ``superset.feature_flag_manager``. A top-level import here would
# create a circular dependency (security ↔ models.core ↔ superset).
```
**M5 — `BaseDeletedStateFilter.model: Any` → typed. Fixed.**
```python
# Subclasses bind ``model`` to a concrete ``SoftDeleteMixin``
# subclass. Typed as ``type[SoftDeleteMixin]`` so a subclass that
# accidentally binds to a non-soft-deletable entity fails mypy
# rather than crashing at runtime on ``.deleted_at``.
model: ClassVar[type[SoftDeleteMixin]]
```
**M6 — listener placement. Documented.** Investigated and confirmed the
placement is intentional — the `do_orm_execute` hook attaches to the `Session`
*class* (process-wide), not a Session instance, so it has no Flask-app-context
dependency. `setup_db()` runs earlier in `init_app` so the `Session` import is
already initialised. Added a six-line comment in `init_app` explaining the
placement so the rationale doesn't have to be re-derived from review history.
**M7 — no tests for `BaseDeletedStateFilter` / `SoftDeleteApiMixin`.
Fixed.** New `tests/unit_tests/views/test_soft_delete_filter.py` with 10 tests:
- Filter: value-absent / include / only / case-insensitivity / unknown-value
— including assertions that the session bypass set and the `g` augmentation
flag are both correctly populated.
- Mixin: no-op-without-flag / inject-with-flag / **consume-flag-on-call**
(the read-and-clear semantics you specifically called out as subtle enough to
warrant a dedicated test) / `_serialize_deleted_at` handling of `None` and
`datetime`.
**M8 — `deleted_at` schema mutation undocumented. Acknowledged.** This is
intentional — `deleted_state=include` augments rows with `deleted_at`, and
that's the point of the filter. I'll add a "ADDITIONAL INFORMATION" callout to
each entity rollout PR body (#40128 / #40129 / #40130) noting the response
schema additions, and call out the OpenAPI response-schema update as a
per-entity-PR responsibility so it's explicit on the checklist. Not changing
the infrastructure PR for this — the augmentation logic itself is correct.
### Nits
- **Explicit `orig_resource is None` guard in `raise_for_ownership`.
Fixed.** The fallthrough was correct in effect (`hasattr(None, "owners") →
False → owners=[] → MISSING_OWNERSHIP_ERROR`) but the error reads "you don't
own this" when the actual cause is "this row was hard-deleted between the
caller's load and the re-query." Now raises the precise cause:
```python
if orig_resource is None:
raise SupersetSecurityException(...
message=_("Resource was removed before ownership could be
verified"),
...)
```
- **Unit tests for `find_existing_for_import` /
`clear_soft_deleted_for_import`. Fixed.** New
`tests/unit_tests/commands/importers/v1/test_find_existing_for_import.py` (5
tests) — pure-lookup behaviour for live + soft-deleted + missing rows, `clear`
hard-deletes via the ORM (fires `after_delete` listeners), and the composed
find-then-clear contract that's the documented caller sequence.
- **Feature-flag kill switch (`ENABLE_SOFT_DELETE`)**: deliberately not
adding. SIP-208 explicitly discusses this: a global flag risks confusing
partial states (some entities filtered, others not, depending on which adopters
have the mixin at the time the flag flips). The rollback path is "revert the
entity rollout PR" rather than a runtime kill — the infrastructure is a no-op
without an adopter, so a problem in production means a specific entity rollout
PR misbehaves, and reverting that single PR is the right escape hatch.
- **`Query.get()` → `session.get(Model, pk)` (SQLAlchemy 2.0 idiom)**:
skipping. The pattern is pre-existing in `raise_for_ownership` and unrelated to
soft-delete; doing it here would mix scopes. Worth a separate cleanup PR for
the codebase once the SQLAlchemy 1.4 → 2.x migration is on the roadmap (the
`Mapped[]` adoption thread already touches this area).
- **`register_soft_delete_listener()` public function**: skipping. The
inline `from superset.models.helpers import _add_soft_delete_filter` in
`setup_soft_delete_listener` is one line in one place; the indirection doesn't
pay for itself.
### Architecture notes
- **Cascade decision per entity**: agreed, documenting as a per-entity
checklist item. Updating the three entity rollout PR bodies (#40128 / #40129 /
#40130) to call out the cascade-neutral choice for each (charts: dashboards
keep their reference, render a placeholder; dashboards: child slices stay live;
datasets: dependent charts render an error at chart-load).
- **`owners` M2M constraint**: noted. Logging as a documented constraint on
future soft-deletable additions — if `User` or `Role` were ever to become
soft-deletable, `raise_for_ownership`'s `.owners` lookup would silently return
empty for legitimate restores. Not a current risk (neither model is on the
SIP-208 roadmap), but worth pinning in the mixin docstring as a future-proofing
note. Will add in a follow-up tidy.
### Re: the praise
Thanks. The dual-bypass mechanism in particular went through ~5 design
iterations — per-request `g` flag → moved into `raise_for_ownership` →
per-query `execution_options` → discovered FAB's outer/inner reconstruction
strips them → session-scoped `session.info` → per-class scoping. The "design
doc woven into the code" effect is partly because each iteration's reasoning
*had* to land in docstrings to keep the next reviewer (and future-me) from
re-deriving the same trade-offs.
---
Infrastructure tip: `06a64f02c1`. Entity branches rebased: `sc-106189` →
`a3ac63be0b`, `sc-106190` → `8bff0145b5`, `sc-106191` → `0a56e67842`. CI
re-running across all four PRs.
40 unit tests pass (was 26; added 14 covering filter, mixin, importer
helpers). Let me know if anything else surfaces.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]