hjadm opened a new issue, #41496:
URL: https://github.com/apache/superset/issues/41496

   ## [Feature] Dataset Relationship Model β€” declare and manage relationships 
between datasets
   
   > Reference implementation: https://github.com/apache/superset/pull/40981
   
   ---
   
   ### 🎯 Problem
   
   Apache Superset has no first-class concept of relationships between 
datasets. In normalized databases (star schemas, relational models), users must 
manually create denormalized views or virtual datasets with JOINs every time 
they need to reference columns from a related table.
   
   This creates friction:
   - **Duplication:** multiple virtual datasets that are just thin JOIN wrappers
   - **No navigability:** no way to see how datasets connect to each other
   - **No infrastructure:** features like cross-filtering, drill-down, and 
dataset merging lack a standardized metadata layer to consume
   
   ### βœ… Proposed Solution
   
   Introduce a data model + API + UI for declaring **typed, directed 
relationships** between any pair of Superset datasets.
   
   ```
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚              FEATURE FLAG               β”‚
   β”‚        DATASET_RELATIONSHIPS (bool)     β”‚
   β”‚         Default: False (opt-in)         β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚   SQLAlchemy    β”‚    β”‚   Alembic       β”‚
   β”‚   Models        │───▢│   Migration    β”‚
   β”‚                 β”‚    β”‚                 β”‚
   β”‚ - source/target β”‚    β”‚ create table    β”‚
   β”‚ - relationship  β”‚    β”‚   dataset_      β”‚
   β”‚   type (enum)   β”‚    β”‚   relationships β”‚
   β”‚ - join_type     β”‚    β”‚ create table    β”‚
   β”‚ - column pairs  β”‚    β”‚   dataset_      β”‚
   β”‚ - soft delete   β”‚    β”‚   relationship_ β”‚
   β”‚ - audit cols    β”‚    β”‚   columns       β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚            REST API                    β”‚
   β”‚                                        β”‚
   β”‚ GET/POST/PUT/DELETE /api/v1/           β”‚
   β”‚   dataset_relationship/                β”‚
   β”‚ GET /api/v1/dataset_relationship/      β”‚
   β”‚   dataset/{id}                         β”‚
   β”‚ GET /api/v1/dataset/{id} β†’ includes    β”‚
   β”‚   relationships (feature-gated)        β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚          FRONTEND (React Flow)         β”‚
   β”‚                                        β”‚
   β”‚ RelationshipCanvas β†’ Graph editor      β”‚
   β”‚ DatasetNode β†’ Custom node              β”‚
   β”‚ RelationshipEdge β†’ Edge with label     β”‚
   β”‚ ColumnPickerModal β†’ Map columns        β”‚
   β”‚ RelationshipBadge β†’ Explore sidebar    β”‚
   β”‚ DrillDownConfig β†’ Drill-down config    β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚         CROSS-DATASET FILTERS          β”‚
   β”‚                                        β”‚
   β”‚ useCrossDatasetFilters hook            β”‚
   β”‚ β†’ translates active chart filters      β”‚
   β”‚   from source to target WHERE clauses  β”‚
   β”‚ β†’ cross-database merge engine          β”‚
   β”‚   (Redis cache, schema drift detect)   β”‚
   β”‚ β†’ hierarchical drill-down across       β”‚
   β”‚   related datasets                     β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   ```
   
   ### πŸ‘€ User Stories
   
   **US1 β€” Visual navigation**
   > "As an analyst, I want to see a graph of how my datasets relate to each 
other, so I can understand the data model without hunting through 
documentation."
   
   **US2 β€” Cross-dataset cross-filtering**
   > "As an analyst, I want a filter applied to an Orders chart to 
automatically filter a related Customers chart, without manually creating 
JOINs."
   
   **US3 β€” Hierarchical drill-down**
   > "As an analyst, I want to click a country in a Sales chart and drill down 
to state-level detail, navigating through a many-to-one relationship."
   
   **US4 β€” Cross-database merge**
   > "As an admin, I want to relate a PostgreSQL dataset with a MySQL dataset 
and let Superset merge them automatically, without an intermediate ETL."
   
   ### πŸ“ Technical Design
   
   #### Models
   
   **DatasetRelationship**
   
   | Field | Type | Description |
   |-------|------|-------------|
   | id | Integer PK | Auto |
   | source_dataset_id | FK β†’ tables.id | Source dataset |
   | target_dataset_id | FK β†’ tables.id | Target dataset |
   | relationship_type | Enum | one_to_one, one_to_many, many_to_one, 
many_to_many |
   | join_type | Enum | INNER, LEFT, RIGHT, FULL |
   | is_cross_database | Bool | Auto-detected |
   | is_active | Bool | Soft toggle |
   | name | String(255) | Human-readable name (optional) |
   | description | Text | Description (optional) |
   | created_on / changed_on | DateTime | Audit |
   | created_by_fk / changed_by_fk | FK | Audit |
   
   **DatasetRelationshipColumn**
   
   | Field | Type | Description |
   |-------|------|-------------|
   | id | Integer PK | Auto |
   | relationship_id | FK β†’ dataset_relationships | Cascade delete |
   | source_column_name | String(255) | Source column |
   | target_column_name | String(255) | Target column |
   | operator | String(10) | Default "=" |
   | ordinal | Integer | Order for multi-column join keys |
   
   Constraints:
   - Unique on `(source_dataset_id, target_dataset_id)`
   - At least one column pair (Python validation)
   - Cascade delete: removing a dataset removes its relationships
   
   #### REST API
   
   | Method | Endpoint | Description |
   |--------|----------|-------------|
   | GET | /api/v1/dataset_relationship/ | List (filterable, paginated) |
   | POST | /api/v1/dataset_relationship/ | Create |
   | GET | /api/v1/dataset_relationship/{id} | Get by ID |
   | PUT | /api/v1/dataset_relationship/{id} | Update |
   | DELETE | /api/v1/dataset_relationship/{id} | Delete |
   | DELETE | /api/v1/dataset_relationship/?q=[ids] | Bulk delete |
   | GET | /api/v1/dataset_relationship/dataset/{id} | List all for a dataset |
   
   The dataset detail endpoint (`GET /api/v1/dataset/{id}`) gains an optional 
`relationships` field (feature-gated).
   
   #### Frontend (React Flow)
   
   - **RelationshipCanvas** β€” interactive graph with zoom, pan, and auto-layout
   - **DatasetNode** β€” custom node showing name, database icon, key columns
   - **RelationshipEdge** β€” custom edge with directional arrow and label 
(cardinality + join type)
   - **ColumnPickerModal** β€” modal for mapping source β†’ target columns
   - **DrillDownConfig** β€” hierarchical drill-down configuration panel
   - **RelationshipBadge** β€” badge in Explore sidebar showing relationship count
   
   #### Cross-Dataset Filters
   
   - `useCrossDatasetFilters` hook that translates active chart filters into 
WHERE clauses targeting the related dataset
   - Cross-database merge engine with Redis caching
   - Automatic JOIN injection for related datasets
   - Schema drift detection via scheduled Celery task
   
   ### πŸ”§ Feature Flag
   
   **`DATASET_RELATIONSHIPS = False`** in `config.py`.
   
   When disabled:
   - API returns 403 on relationship endpoints
   - Relationship fields do not appear in responses
   - Frontend does not render the canvas or badges
   - Zero impact on performance or existing behavior
   
   ### πŸ“¦ Dependencies
   
   | Package | Version | License | Purpose |
   |---------|---------|---------|---------|
   | @xyflow/react | ^12.x | MIT | Graph canvas rendering |
   
   Zero new Python dependencies.
   
   ### πŸ”„ Migration
   
   One Alembic revision creating:
   - `dataset_relationships`
   - `dataset_relationship_columns`
   
   Safe rollback via `DROP TABLE ... CASCADE`.
   
   ### ❓ Open Questions
   
   1. **Cross-database merge engine**: should it ship in v1 or as a follow-up 
feature?
   2. **Drill-down config**: how to expose it in the UI without cluttering the 
chart builder?
   3. **Performance**: is Redis caching sufficient for frequent cross-database 
merges, or is something more needed?
   4. **Schema drift**: ideal frequency for the detection Celery task? Daily? 
Weekly?
   5. **Naming**: is "Dataset Relationship Model" clear enough, or are there 
better suggestions?
   
   ### πŸ”— Links
   
   - **Implementation PR:** https://github.com/apache/superset/pull/40981
   - **SIP Document:** 
https://github.com/hjadm/hibi/blob/main/SIP-DATASET-RELATIONSHIPS.md
   
   ### πŸ—οΈ Implementation Status
   
   | Block | Status |
   |-------|--------|
   | Models + Migration | βœ… Complete |
   | REST API | βœ… Complete |
   | Feature Flag | βœ… Complete |
   | Frontend Canvas | βœ… Complete |
   | Cross-DB Merge Engine | βœ… Complete |
   | Cross-Dataset Filters | βœ… Complete |
   | Drill-Down Navigation | βœ… Complete |
   | Tests (156 passing) | βœ… Complete |
   | Schema Drift Detection | βœ… Complete |
   
   ### 🚧 Blockers for Upstream
   
   - Decision on whether cross-database merge should ship in v1
   - Security review of automatic JOIN injection
   - Performance benchmarks for cross-database queries
   - Schema drift periodicity definition
   - SIP approval by the community
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to