Re: [I] [SIP-182] Semantic Layer Support in Apache Superset [superset]

via GitHub Thu, 05 Feb 2026 11:09:31 -0800


michael-s-molina commented on issue #35003:
URL: https://github.com/apache/superset/issues/35003#issuecomment-3855663159


   Thanks for the great discussion in the Extensions meeting @betodealmeida! 
I'm pasting here what we aligned to make semantic layers extendable:
   
   # Semantic Layers: Extension System Integration
   
   ## Goal
   
   Transition from Python entry points to Superset's extension system with 
`extension.json` metadata, enabling:
   
   - Declarative contribution registration
   - Lazy loading / activation events
   - Marketplace discovery without code execution
   
   ## Architecture Changes
   
   ### 1. Move Interfaces to superset-core
   
   Protocols and types move to superset-core (the shared pip package):
   
   ```
   superset-core/
   └── semantic_layers/
        ├── types.py            # Dimension, Metric, Filter, SemanticResult, 
etc.
        ├── semantic_view.py
        └── semantic_layer.py
   ```
   
   Implementations remain in superset or extensions.
   
   ### 2. Add Backend Contribution Types
   
   **Current state:** Frontend has structured contributions (views, commands, 
menus, editors). The backend only has entry points.
   
   **Proposed:** Add contributions to the backend. This would also formalize 
existing superset-core APIs (REST APIs, MCP tools) that are registered via 
Python calls.
   
   ```json
   // extension.json
   {
     "backend": {
       "contributions": {
         "semanticLayers": [
           {
             "id": "snowflake",
             "name": "Snowflake Semantic Layer",
             "description": "Connect to Snowflake's semantic layer",
             "module": "my_extension.snowflake.SnowflakeSemanticLayer"
           }
         ],
         "restApis": [
           {
             "id": "my_api",
             "name": "My Extension API",
             "module": "my_extension.api.MyExtensionAPI"
           }
         ],
         "mcpTools": [
           {
             "id": "query_database",
             "name": "Query Database",
             "description": "Execute a SQL query against a database",
             "module": "my_extension.mcp.QueryDatabaseTool"
           }
         ],
         "mcpPrompts": [
           {
             "id": "analyze_data",
             "name": "Analyze Data",
             "description": "Generate analysis for a dataset",
             "module": "my_extension.mcp.AnalyzeDataPrompt"
           }
         ]
       }
     }
   }
   ```
   
   **Benefits of declarative metadata:**
   
   - **Activation events:** Like VSCode - load extensions lazily when their 
contribution type is needed
   - **Marketplace/Discovery:** Display available semantic layers without 
importing code
   - **Dependency resolution:** Understand what extensions provide before 
loading
   - **Security review:** Admins can review contributions before enabling
   
   This pattern could also be applied to other existing Superset features like 
database engines, auth providers, cache backends, and alert handlers - allowing 
them to be provided as extensions with the same benefits.
   
   ### 3. Replace Registry with Extension Manager
   
   **Current:** Standalone `registry.py` with `register_semantic_layer()` / 
`get_semantic_layer()`.
   
   **Proposed:** Use extension manager as the registry:
   
   ```python
   # Instead of:
   from superset.semantic_layers.registry import get_semantic_layer
   layer_cls = get_semantic_layer("snowflake")
   
   # Use:
   from superset.extensions import extension_manager
   layer_cls = extension_manager.get_contribution("semanticLayers", "snowflake")
   ```
   
   This mirrors the frontend pattern:
   
   ```typescript
   // Frontend (TypeScript)
   import { extensionManager } from "@superset-ui/core";
   const editor = extensionManager.getContribution("editors", "monaco_sql");
   ```
   
   **Benefits:**
   
   - Consistency between frontend and backend
   - Single source of truth for all contributions
   - Lazy loading - defer until contribution is requested
   - Metadata-first - query available contributions without loading code
   
   ### 4. Mapper Location
   
   The mapper stays in superset (not superset-core) since it depends on:
   
   - Superset's `QueryObject`
   - `BaseDatasource` and other internal types
   - Superset-specific query handling (time comparisons, series limits)
   
   ## Open Questions
   
   ### 1. Interface Pattern
   
   **Option A: Protocol pattern** (current branch approach)
   
   Uses Python's `Protocol` from typing module with `@runtime_checkable` 
decorator.
   
   ✅ **Option B: Stub replacement pattern** (consistent with superset-core) 
   
   Uses concrete base class with methods that raise `NotImplementedError`.
   
   **We decided for Option B.** 
   
   ## Gap Analysis: Dimension/Metric Compatibility
   
   ### Bidirectional Compatibility Filtering
   
   Some semantic layers have constraints where not all metric/dimension 
combinations are valid. The available metrics may depend on which dimensions 
are selected, and vice versa. This requires the ability to filter dimensions 
and metrics based on each other.
   
   **Use cases:**
   
   - Metrics tied to specific dimension sets - selecting a metric limits 
available dimensions
   - Dimensions tied to specific data sources - selecting dimensions limits 
available metrics
   - The UI should dynamically filter available options as users make selections
   
   ### Current API Gap
   
   The proposed `SemanticViewImplementation` assumes dimensions and metrics are 
globally available:
   
   ```python
   # All dimensions (static)
   get_dimensions()
   
   # All metrics (static)
   get_metrics()
   ```
   
   This doesn't support semantic layers where compatibility depends on what's 
already selected.
   
   ### Proposed API Extension
   
   Add optional methods for compatibility filtering:
   
   ```python
   # Returns metrics compatible with the selected dimensions
   get_compatible_metrics(selected_dimensions)
   
   # Returns dimensions compatible with the selected metrics
   get_compatible_dimensions(selected_metrics, selected_dimensions)
   ```
   
   These would be optional - semantic layers without this constraint would 
return all metrics/dimensions. The frontend would call these methods as users 
make selections to filter the available options.
   
   ### Semantic Layers With This Requirement
   
   | Semantic Layer | Compatibility Handling |
   |----------------|------------------------|
   | **dbt Semantic Layer (MetricFlow)** | The GraphQL API provides 
`dimensionsPaginated(metrics: [MetricInput!]!)` which returns only dimensions 
compatible with selected metrics. This exists because metrics can span multiple 
semantic models. |
   | **Minerva (Airbnb)** | Has validation endpoints (`/minerva/valid_metrics`, 
`/minerva/valid_columns`) that implement this filtering. Metrics are tied to 
event sources and dimension sets, so selecting certain metrics excludes certain 
dimensions and vice versa. |
   | **Cube.js** | Handles compatibility structurally - all dimensions within a 
cube are compatible with all measures in that cube. No explicit filtering API 
needed. |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [SIP-182] Semantic Layer Support in Apache Superset [superset]

Reply via email to