sanket bhor created ATLAS-5312:
----------------------------------
Summary: Handle Delete Propogation between Related entities
Key: ATLAS-5312
URL: https://issues.apache.org/jira/browse/ATLAS-5312
Project: Atlas
Issue Type: New Feature
Reporter: sanket bhor
Assignee: sanket bhor
*Problem Statement*
When a
container entity is deleted in Atlas, entities linked via AGGREGATION or
ASSOCIATION relationships are NOT deleted — they become orphaned stale metadata
visible in UI, search, lineage, and
governance policies.
*Two gaps exist:*
1. Delete Cascade (AGGREGATION): When a trino_schema is deleted (e.g., DROP
SCHEMA sales CASCADE), trino_table entities linked via AGGREGATION
(trino_table_schema) are NOT deleted. Only
COMPOSITION-owned children are cascaded today via
DeleteHandlerV1.getOwnedVertices().
2. Delete Propagation (Cross-system aliases): When a source entity is deleted
(e.g., hive_table or hive_db), alias entities in other systems (e.g.,
trino_table via trino_table_hive_table, trino_schema
via trino_schema_hive_db) remain as stale metadata pointing to non-existent
source entities.
*Current behavior:*
- DeleteHandlerV1.getOwnedVertices() only follows isOwnedRef=true attributes
(injected only for COMPOSITION relationships)
- AGGREGATION/ASSOCIATION edges: only the relationship edge is removed; the
child/alias entity persists
- Result: orphaned tables, columns, schemas visible in Atlas after source
deletion
*Proposed Solution (High Level)*
Add a typedef-driven propagateDelete boolean flag on AtlasRelationshipEndDef
(mirrors existing propagateRename pattern):
- Flag is configured via model patches (SET_PROPAGATE_DELETE action) — no
hook-side changes required
- At typedef resolution time, AtlasEntityType pre-computes
deletePropagationTargets list
- At runtime, DeleteHandlerV1 traverses propagateDelete-marked edges after
getOwnedVertices() and adds connected entities to the deletion set
- Multi-hop propagation supported via recursion (e.g., hive_db → trino_schema
→ trino_table → trino_column)
- Idempotent: skip already-DELETED entities; visited-set prevents cycles
- All propagated deletes happen within the same @GraphTransaction — atomic
commit/rollback
- Supports both soft delete and hard delete (propagation targets inherit
parent's delete type)
*Steps to Reproduce*
Scenario A — AGGREGATION orphan (trino_schema → trino_table):
1. Create trino_schema entity (qualifiedName=cat1.sales@inst1) with 3
trino_table entities linked via trino_table_schema relationship
2. Send ENTITY_DELETE_V2 event for trino_schema:
{"type":"ENTITY_DELETE_V2","user":"trino","entities":[\{"typeName":"trino_schema","uniqueAttributes":{"qualifiedName":"cat1.sales@inst1"}}]}
3. Observe: trino_schema is deleted, but all 3 trino_table entities and their
trino_column entities remain in Atlas (orphaned)
Scenario B — Cross-system alias orphan (hive_table → trino_table):
1. Create hive_table entity (qualifiedName=default.orders@cluster) linked to
trino_table (qualifiedName=cat1.schema1.orders@inst1) via
trino_table_hive_table relationship
2. Send ENTITY_DELETE_V2 event for hive_table:
{"type":"ENTITY_DELETE_V2","user":"hive","entities":[\{"typeName":"hive_table","uniqueAttributes":{"qualifiedName":"default.orders@cluster"}}]}
3. Observe: hive_table and its hive_column entities are deleted, but
trino_table and its trino_column entities remain (stale alias)
Scenario C — Cross-system schema orphan (hive_db → trino_schema):
1. Create hive_db (qualifiedName=sales@cluster) linked to trino_schema
(qualifiedName=cat1.sales@inst1) via trino_schema_hive_db relationship
2. Send ENTITY_DELETE_V2 event for hive_db:
{"type":"ENTITY_DELETE_V2","user":"hive","entities":[\{"typeName":"hive_db","uniqueAttributes":{"qualifiedName":"sales@cluster"}}]}
3. Observe: hive_db deleted, but trino_schema, its trino_table entities, and
trino_column entities all remain (stale)
*Acceptance Criteria*
Functional
- [ ] Deleting a trino_schema cascades deletion to all trino_table entities
linked via trino_table_schema (AGGREGATION) and their trino_column entities
(COMPOSITION)
- [ ] Deleting a hive_table propagates deletion to linked trino_table (via
trino_table_hive_table) and its trino_column entities
- [ ] Deleting a hive_db propagates deletion to linked trino_schema (via
trino_schema_hive_db), which further cascades to all its trino_table and
trino_column entities
- [ ] Propagation is unidirectional: deleting trino_table does NOT delete
hive_table; deleting trino_schema does NOT delete hive_db
- [ ] Multi-hop propagation works: hive_db → trino_schema → trino_table →
trino_column (full chain)
- [ ] Both soft delete and hard delete modes are supported (propagation targets
inherit parent's delete type)
- [ ] Feature is opt-in via model patches — no behavior change without explicit
SET_PROPAGATE_DELETE patch enablement
--
This message was sent by Atlassian Jira
(v8.20.10#820010)