hjadm opened a new issue, #41496: URL: https://github.com/apache/superset/issues/41496
## [Feature] Dataset Relationship Model β declare and manage relationships between datasets > Reference implementation: https://github.com/apache/superset/pull/40981 --- ### π― Problem Apache Superset has no first-class concept of relationships between datasets. In normalized databases (star schemas, relational models), users must manually create denormalized views or virtual datasets with JOINs every time they need to reference columns from a related table. This creates friction: - **Duplication:** multiple virtual datasets that are just thin JOIN wrappers - **No navigability:** no way to see how datasets connect to each other - **No infrastructure:** features like cross-filtering, drill-down, and dataset merging lack a standardized metadata layer to consume ### β Proposed Solution Introduce a data model + API + UI for declaring **typed, directed relationships** between any pair of Superset datasets. ``` βββββββββββββββββββββββββββββββββββββββββββ β FEATURE FLAG β β DATASET_RELATIONSHIPS (bool) β β Default: False (opt-in) β βββββββββββββββββββββββββββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β SQLAlchemy β β Alembic β β Models βββββΆβ Migration β β β β β β - source/target β β create table β β - relationship β β dataset_ β β type (enum) β β relationships β β - join_type β β create table β β - column pairs β β dataset_ β β - soft delete β β relationship_ β β - audit cols β β columns β βββββββββββββββββββ βββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββ β REST API β β β β GET/POST/PUT/DELETE /api/v1/ β β dataset_relationship/ β β GET /api/v1/dataset_relationship/ β β dataset/{id} β β GET /api/v1/dataset/{id} β includes β β relationships (feature-gated) β ββββββββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββ β FRONTEND (React Flow) β β β β RelationshipCanvas β Graph editor β β DatasetNode β Custom node β β RelationshipEdge β Edge with label β β ColumnPickerModal β Map columns β β RelationshipBadge β Explore sidebar β β DrillDownConfig β Drill-down config β ββββββββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββ β CROSS-DATASET FILTERS β β β β useCrossDatasetFilters hook β β β translates active chart filters β β from source to target WHERE clauses β β β cross-database merge engine β β (Redis cache, schema drift detect) β β β hierarchical drill-down across β β related datasets β ββββββββββββββββββββββββββββββββββββββββββ ``` ### π€ User Stories **US1 β Visual navigation** > "As an analyst, I want to see a graph of how my datasets relate to each other, so I can understand the data model without hunting through documentation." **US2 β Cross-dataset cross-filtering** > "As an analyst, I want a filter applied to an Orders chart to automatically filter a related Customers chart, without manually creating JOINs." **US3 β Hierarchical drill-down** > "As an analyst, I want to click a country in a Sales chart and drill down to state-level detail, navigating through a many-to-one relationship." **US4 β Cross-database merge** > "As an admin, I want to relate a PostgreSQL dataset with a MySQL dataset and let Superset merge them automatically, without an intermediate ETL." ### π Technical Design #### Models **DatasetRelationship** | Field | Type | Description | |-------|------|-------------| | id | Integer PK | Auto | | source_dataset_id | FK β tables.id | Source dataset | | target_dataset_id | FK β tables.id | Target dataset | | relationship_type | Enum | one_to_one, one_to_many, many_to_one, many_to_many | | join_type | Enum | INNER, LEFT, RIGHT, FULL | | is_cross_database | Bool | Auto-detected | | is_active | Bool | Soft toggle | | name | String(255) | Human-readable name (optional) | | description | Text | Description (optional) | | created_on / changed_on | DateTime | Audit | | created_by_fk / changed_by_fk | FK | Audit | **DatasetRelationshipColumn** | Field | Type | Description | |-------|------|-------------| | id | Integer PK | Auto | | relationship_id | FK β dataset_relationships | Cascade delete | | source_column_name | String(255) | Source column | | target_column_name | String(255) | Target column | | operator | String(10) | Default "=" | | ordinal | Integer | Order for multi-column join keys | Constraints: - Unique on `(source_dataset_id, target_dataset_id)` - At least one column pair (Python validation) - Cascade delete: removing a dataset removes its relationships #### REST API | Method | Endpoint | Description | |--------|----------|-------------| | GET | /api/v1/dataset_relationship/ | List (filterable, paginated) | | POST | /api/v1/dataset_relationship/ | Create | | GET | /api/v1/dataset_relationship/{id} | Get by ID | | PUT | /api/v1/dataset_relationship/{id} | Update | | DELETE | /api/v1/dataset_relationship/{id} | Delete | | DELETE | /api/v1/dataset_relationship/?q=[ids] | Bulk delete | | GET | /api/v1/dataset_relationship/dataset/{id} | List all for a dataset | The dataset detail endpoint (`GET /api/v1/dataset/{id}`) gains an optional `relationships` field (feature-gated). #### Frontend (React Flow) - **RelationshipCanvas** β interactive graph with zoom, pan, and auto-layout - **DatasetNode** β custom node showing name, database icon, key columns - **RelationshipEdge** β custom edge with directional arrow and label (cardinality + join type) - **ColumnPickerModal** β modal for mapping source β target columns - **DrillDownConfig** β hierarchical drill-down configuration panel - **RelationshipBadge** β badge in Explore sidebar showing relationship count #### Cross-Dataset Filters - `useCrossDatasetFilters` hook that translates active chart filters into WHERE clauses targeting the related dataset - Cross-database merge engine with Redis caching - Automatic JOIN injection for related datasets - Schema drift detection via scheduled Celery task ### π§ Feature Flag **`DATASET_RELATIONSHIPS = False`** in `config.py`. When disabled: - API returns 403 on relationship endpoints - Relationship fields do not appear in responses - Frontend does not render the canvas or badges - Zero impact on performance or existing behavior ### π¦ Dependencies | Package | Version | License | Purpose | |---------|---------|---------|---------| | @xyflow/react | ^12.x | MIT | Graph canvas rendering | Zero new Python dependencies. ### π Migration One Alembic revision creating: - `dataset_relationships` - `dataset_relationship_columns` Safe rollback via `DROP TABLE ... CASCADE`. ### β Open Questions 1. **Cross-database merge engine**: should it ship in v1 or as a follow-up feature? 2. **Drill-down config**: how to expose it in the UI without cluttering the chart builder? 3. **Performance**: is Redis caching sufficient for frequent cross-database merges, or is something more needed? 4. **Schema drift**: ideal frequency for the detection Celery task? Daily? Weekly? 5. **Naming**: is "Dataset Relationship Model" clear enough, or are there better suggestions? ### π Links - **Implementation PR:** https://github.com/apache/superset/pull/40981 - **SIP Document:** https://github.com/hjadm/hibi/blob/main/SIP-DATASET-RELATIONSHIPS.md ### ποΈ Implementation Status | Block | Status | |-------|--------| | Models + Migration | β Complete | | REST API | β Complete | | Feature Flag | β Complete | | Frontend Canvas | β Complete | | Cross-DB Merge Engine | β Complete | | Cross-Dataset Filters | β Complete | | Drill-Down Navigation | β Complete | | Tests (156 passing) | β Complete | | Schema Drift Detection | β Complete | ### π§ Blockers for Upstream - Decision on whether cross-database merge should ship in v1 - Security review of automatic JOIN injection - Performance benchmarks for cross-database queries - Schema drift periodicity definition - SIP approval by the community -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
