michael-s-molina commented on issue #35003:
URL: https://github.com/apache/superset/issues/35003#issuecomment-3855663159
Thanks for the great discussion in the Extensions meeting @betodealmeida!
I'm pasting here what we aligned to make semantic layers extendable:
# Semantic Layers: Extension System Integration
## Goal
Transition from Python entry points to Superset's extension system with
`extension.json` metadata, enabling:
- Declarative contribution registration
- Lazy loading / activation events
- Marketplace discovery without code execution
## Architecture Changes
### 1. Move Interfaces to superset-core
Protocols and types move to superset-core (the shared pip package):
```
superset-core/
└── semantic_layers/
├── types.py # Dimension, Metric, Filter, SemanticResult,
etc.
├── semantic_view.py
└── semantic_layer.py
```
Implementations remain in superset or extensions.
### 2. Add Backend Contribution Types
**Current state:** Frontend has structured contributions (views, commands,
menus, editors). The backend only has entry points.
**Proposed:** Add contributions to the backend. This would also formalize
existing superset-core APIs (REST APIs, MCP tools) that are registered via
Python calls.
```json
// extension.json
{
"backend": {
"contributions": {
"semanticLayers": [
{
"id": "snowflake",
"name": "Snowflake Semantic Layer",
"description": "Connect to Snowflake's semantic layer",
"module": "my_extension.snowflake.SnowflakeSemanticLayer"
}
],
"restApis": [
{
"id": "my_api",
"name": "My Extension API",
"module": "my_extension.api.MyExtensionAPI"
}
],
"mcpTools": [
{
"id": "query_database",
"name": "Query Database",
"description": "Execute a SQL query against a database",
"module": "my_extension.mcp.QueryDatabaseTool"
}
],
"mcpPrompts": [
{
"id": "analyze_data",
"name": "Analyze Data",
"description": "Generate analysis for a dataset",
"module": "my_extension.mcp.AnalyzeDataPrompt"
}
]
}
}
}
```
**Benefits of declarative metadata:**
- **Activation events:** Like VSCode - load extensions lazily when their
contribution type is needed
- **Marketplace/Discovery:** Display available semantic layers without
importing code
- **Dependency resolution:** Understand what extensions provide before
loading
- **Security review:** Admins can review contributions before enabling
This pattern could also be applied to other existing Superset features like
database engines, auth providers, cache backends, and alert handlers - allowing
them to be provided as extensions with the same benefits.
### 3. Replace Registry with Extension Manager
**Current:** Standalone `registry.py` with `register_semantic_layer()` /
`get_semantic_layer()`.
**Proposed:** Use extension manager as the registry:
```python
# Instead of:
from superset.semantic_layers.registry import get_semantic_layer
layer_cls = get_semantic_layer("snowflake")
# Use:
from superset.extensions import extension_manager
layer_cls = extension_manager.get_contribution("semanticLayers", "snowflake")
```
This mirrors the frontend pattern:
```typescript
// Frontend (TypeScript)
import { extensionManager } from "@superset-ui/core";
const editor = extensionManager.getContribution("editors", "monaco_sql");
```
**Benefits:**
- Consistency between frontend and backend
- Single source of truth for all contributions
- Lazy loading - defer until contribution is requested
- Metadata-first - query available contributions without loading code
### 4. Mapper Location
The mapper stays in superset (not superset-core) since it depends on:
- Superset's `QueryObject`
- `BaseDatasource` and other internal types
- Superset-specific query handling (time comparisons, series limits)
## Open Questions
### 1. Interface Pattern
**Option A: Protocol pattern** (current branch approach)
Uses Python's `Protocol` from typing module with `@runtime_checkable`
decorator.
✅ **Option B: Stub replacement pattern** (consistent with superset-core)
Uses concrete base class with methods that raise `NotImplementedError`.
**We decided for Option B.**
## Gap Analysis: Dimension/Metric Compatibility
### Bidirectional Compatibility Filtering
Some semantic layers have constraints where not all metric/dimension
combinations are valid. The available metrics may depend on which dimensions
are selected, and vice versa. This requires the ability to filter dimensions
and metrics based on each other.
**Use cases:**
- Metrics tied to specific dimension sets - selecting a metric limits
available dimensions
- Dimensions tied to specific data sources - selecting dimensions limits
available metrics
- The UI should dynamically filter available options as users make selections
### Current API Gap
The proposed `SemanticViewImplementation` assumes dimensions and metrics are
globally available:
```python
# All dimensions (static)
get_dimensions()
# All metrics (static)
get_metrics()
```
This doesn't support semantic layers where compatibility depends on what's
already selected.
### Proposed API Extension
Add optional methods for compatibility filtering:
```python
# Returns metrics compatible with the selected dimensions
get_compatible_metrics(selected_dimensions)
# Returns dimensions compatible with the selected metrics
get_compatible_dimensions(selected_metrics, selected_dimensions)
```
These would be optional - semantic layers without this constraint would
return all metrics/dimensions. The frontend would call these methods as users
make selections to filter the available options.
### Semantic Layers With This Requirement
| Semantic Layer | Compatibility Handling |
|----------------|------------------------|
| **dbt Semantic Layer (MetricFlow)** | The GraphQL API provides
`dimensionsPaginated(metrics: [MetricInput!]!)` which returns only dimensions
compatible with selected metrics. This exists because metrics can span multiple
semantic models. |
| **Minerva (Airbnb)** | Has validation endpoints (`/minerva/valid_metrics`,
`/minerva/valid_columns`) that implement this filtering. Metrics are tied to
event sources and dimension sets, so selecting certain metrics excludes certain
dimensions and vice versa. |
| **Cube.js** | Handles compatibility structurally - all dimensions within a
cube are compatible with all measures in that cube. No explicit filtering API
needed. |
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]