[I] [SIP] Proposal for Model Context Protocol (MCP) Service - Revised Architecture [superset]

via GitHub Sat, 18 Oct 2025 04:34:16 -0700


aminghadersohi opened a new issue, #35498:
URL: https://github.com/apache/superset/issues/35498


   ## [SIP] Proposal for Model Context Protocol (MCP) Service - Revised 
Architecture
   
   ### Motivation
   
   The prior [SIP-171](https://github.com/apache/superset/issues/33870) 
proposal outlined integrating Model Context Protocol (MCP) as an 
ASGI-compatible service within the existing Flask application architecture. 
However, initial proof-of-concept development has revealed significant 
advantages to a different architectural approach that better aligns with modern 
AI tooling ecosystems and provides cleaner separation of concerns.
   
   This revised proposal presents a refined architecture based on learnings 
from the proof-of-concept implementation that:
   
   1. **Addresses Flask-SQLAlchemy dependency challenges**: The original 
approach faced complications with Flask-AppBuilder's tight coupling to the web 
request lifecycle
   2. **Improves AI agent compatibility**: FastMCP provides superior schema 
definitions and tooling optimized for LLM interactions
   3. **Enables better code reusability**: A library-first approach allows the 
MCP service to leverage Superset's core functionality without web framework 
dependencies
   4. **Supports enterprise deployment patterns**: FastMCP middleware enables 
multi-tenancy and security integrations required for production environments
   
   The Model Context Protocol (MCP) is an open-source standard developed by 
Anthropic that provides AI agents with a unified interface to interact with 
various tools and services. MCP enables secure, controlled access to external 
resources through a standardized JSON-RPC 2.0 protocol.
   
   **References:**
   
   - [MCP Specification](https://spec.modelcontextprotocol.io/) - Official 
protocol specification
   - [MCP Documentation](https://modelcontextprotocol.io/) - Implementation 
guides and examples
   - [FastMCP Library](https://github.com/pydantic/fastmcp) - Python 
implementation framework
   
   By implementing an MCP service for Superset, we enable LLMs to naturally 
create charts, manage dashboards, query data, and perform analytics workflows 
through standardized JSON-RPC 2.0 protocols.
   
   ### Proposed Change
   
   #### Architecture Overview
   
   Instead of the originally proposed ASGI-Flask integration, this revised 
approach implements a **standalone FastMCP service that uses Superset as a 
library**. This architecture provides:
   
   - Clean separation between the web application and AI service layers
   - Better compatibility with FastAPI/Pydantic ecosystems
   - Simplified deployment and scaling options
   - Enhanced testability and maintainability
   
   #### Context Diagram: MCP Service Integration with Superset
   
   ```mermaid
   flowchart TD
       subgraph "External Environment"
           LLM[LLM Clients<br/>Claude, GPT, etc.]
           CLI[Claude Code CLI<br/>VS Code Extensions]
       end
   
       subgraph "Superset Ecosystem"
           subgraph "MCP Service Layer"
               MCP[FastMCP Service<br/>Port 5008]
               Auth[JWT Authentication<br/>& RBAC Middleware]
               Tools[MCP Tools<br/>Charts, Dashboards, SQL]
           end
   
           subgraph "Core Superset Library"
               DAO[DAOs<br/>Charts, Dashboards, Datasets]
               CMD[Commands<br/>Create, Update, Delete]
               Models[SQLAlchemy Models<br/>Database Layer]
           end
   
           subgraph "Data Layer"
               DB[(Superset Metadata<br/>Database)]
               Sources[(Data Sources<br/>External DBs)]
           end
       end
   
       LLM -.->|stdio transport<br/>JSON-RPC 2.0 via MCP| MCP
       CLI -.->|streamable-http transport<br/>JSON-RPC 2.0 over HTTP| MCP
   
       MCP --> Auth
       Auth --> Tools
       Tools -->|Reads| DAO
       Tools -->|Business Logic| CMD
       CMD -->|Mutates Data| DAO
       DAO --> Models
       Models --> DB
       Models --> Sources
   
       classDef mcpLayer fill:#e1f5fe
       classDef coreLayer fill:#f3e5f5
       classDef dataLayer fill:#e8f5e8
   
       class MCP,Auth,Tools mcpLayer
       class DAO,CMD,Models coreLayer
       class DB,Sources dataLayer
   ```
   
   **Key Integration Points:**
   
   - **RBAC Enforcement**: MCP tools leverage existing DAO/Command patterns 
that automatically handle role-based access control
   - **Security Reuse**: Authentication and authorization logic is inherited 
from Superset core, not reimplemented
   - **Library Pattern**: MCP service imports and uses Superset as a library, 
maintaining access to all core functionality
   
   ```jsx
   superset/mcp_service/
   ├── app.py              # FastMCP application instance
   ├── server.py           # Service entry point and configuration
   ├── middleware.py       # Authentication, rate limiting, logging
   ├── chart/             # Chart creation and management tools
   ├── dashboard/         # Dashboard management tools
   ├── dataset/           # Dataset listing and exploration
   ├── sql_lab/           # SQL execution and query tools
   └── system/            # Instance metadata and health tools
   ```
   
   ## Architecture Principles
   
   The MCP service implementation follows key architectural principles that 
ensure maintainability, security, and extensibility:
   
   ### 1. Security Logic Reuse
   
   **Principle**: Reuse security logic from core Superset (commands and DAOs) 
and do not re-implement them.
   
   **Implementation**:
   
   - MCP tools leverage existing `BaseDAO` classes with enhanced `list()` and 
`count()` methods
   - Commands layer provides business logic with built-in RBAC enforcement
   - Authentication flows through existing Flask-AppBuilder security 
infrastructure
   - Row-level security and column-level permissions automatically apply
   
   **Benefits**:
   
   - Consistent security model across web UI and MCP service
   - Reduced attack surface by avoiding security reimplementation
   - Automatic inheritance of security patches and improvements
   
   ### 2. Extensibility for Add-ons
   
   **Principle**: It should be possible to expose new services added by 
extensions as MCP tools.
   
   **Implementation**:
   
   - Plugin-based tool registration system allows extensions to add MCP tools
   - Standardized base classes for tool development with validation patterns
   - Extension hooks for middleware integration (authentication, audit logging)
   - Discovery mechanism for dynamically loaded tools
   
   **Example Extension Pattern**:
   
   ```python
   # Extension can register new MCP tools
   from superset_core.mcp import register_tool # .mcp package is just 
placeholder
   
   @register_tool(domain="custom_analytics")
   class CustomAnalyticsTool(BaseMCPTool):
       def execute(self, request: CustomRequest) -> CustomResponse:
           # Extension-specific logic using Superset DAOs/Commands
           pass
   
   ```
   
   ### 3. Library-First Architecture
   
   **Principle**: MCP service uses Superset as a library, not as a web 
framework dependency.
   
   **Implementation**:
   
   - Imports Superset models, DAOs, and commands without Flask context
   - Standalone FastMCP service with independent lifecycle
   - Database session management decoupled from web request cycle
   - Configuration sharing through environment variables and shared config files
   
   **Benefits**:
   
   - Independent scaling and deployment
   - Cleaner testing with reduced dependencies
   - Better compatibility with async/await patterns
   - Simplified error handling and debugging
   
   ### 4. Validation-First Development
   
   **Principle**: Comprehensive validation prevents errors and improves user 
experience.
   
   **Implementation**:
   
   - 5-layer validation pipeline: Schema → Business Logic → Dataset → Superset 
Compatibility → Runtime
   - Pydantic schemas with discriminated unions for type safety
   - Fuzzy matching for column suggestions on validation errors
   - Security validation prevents XSS and injection attacks
   
   **Validation Layers**:
   
   1. **Schema Validation**: Type checking and required fields
   2. **Business Logic**: Cache parameters, preview formats
   3. **Dataset Validation**: Column existence and compatibility
   4. **Superset Compatibility**: Label conflicts and naming restrictions
   5. **Runtime Validation**: Form data generation and empty chart detection
   
   ### 5. Preview-First User Experience
   
   **Principle**: Enable exploration before persistence to improve user 
workflows.
   
   **Implementation**:
   
   - `generate_chart()` creates explore URLs before saving
   - Cached form data allows iteration without database persistence
   - Preview images enable visual validation in LLM conversations
   - Explicit save operations give users control over persistence
   
   **Workflow**:
   
   1. Generate preview with explore URL
   2. Iterate on configuration through MCP tools
   3. User validates via interactive explore interface
   4. Explicit save when satisfied with result
   
   *Principle**: Enable exploration before persistence to improve user 
workflows. LLM-driven interactions are inherently conversational and 
exploratory, requiring different patterns than traditional transactional APIs.
   
   **LLM Workflows vs Traditional API Workflows:**
   
   | Aspect | Traditional API Calls | LLM Conversations |
   | --- | --- | --- |
   | Intent | Transactional - specific outcome known upfront | Exploratory - 
prototyping and refinement |
   | Persistence | Immediate save on creation | Preview first, save when 
satisfied |
   | Iterations | Minimal - precise requirements | Multiple - trying variations 
and alternatives |
   | Cleanup | Developer manages resources | Automatic - previews don't clutter 
database |
   
   **Chart Exploration & Persistence Tools:**
   
   The MCP service provides separate tools for exploration vs persistence:
   
   1. **Chart Exploration** (no database persistence):
       - `generate_chart(dataset_id, config, save_chart=False)` → Returns 
`form_data_key` + explore URL
       - `update_chart_preview(form_data_key, new_config)` → Iterate on 
configuration
       - User can try multiple variations without creating database records
   2. **Chart Persistence** (save to database):
       - `generate_chart(dataset_id, config, save_chart=True)` → Creates 
permanent chart
       - `update_chart(chart_id, new_config)` → Modifies saved charts
   3. **Dashboard Composition** (assumes finalized charts):
       - `generate_dashboard(chart_ids, title)` → Creates dashboard from saved 
charts
       - `add_chart_to_existing_dashboard(dashboard_id, chart_id)` → Adds chart 
to dashboard
   
   **Implementation Details**:
   
   - Cached form data allows iteration without database persistence
   - Preview images enable visual validation in LLM conversations
   - Explicit save operations give users control over when to persist
   - Dashboard tools assume charts are already finalized (no "preview 
dashboard" concept)
   
   **Example Exploratory Workflow**:
   
   1. User: "Show me sales by region as a bar chart"
   2. MCP: `generate_chart(..., save_chart=False)` → preview URL
   3. User: "Try it as a pie chart instead"
   4. MCP: `update_chart_preview(form_data_key, ...)` → new preview
   5. User: "Perfect, save this one"
   6. MCP: `generate_chart(..., save_chart=True)` → permanent chart
   7. User: "Add it to my Q1 dashboard"
   8. MCP: `add_chart_to_existing_dashboard(...)` → chart added
   
   **Rationale**: LLM conversations naturally involve exploration and 
refinement. Without preview-first design, users would end up with dozens of 
half-finished charts cluttering their workspace. This pattern respects the 
conversational nature of AI interactions while maintaining database integrity.
   
   ### 6. Performance and Caching
   
   **Principle**: Multi-level caching optimizes performance for AI workloads.
   
   **Implementation**:
   
   - Screenshot caching for chart previews
   - Form data caching for iterative exploration (each iteration creates new 
cached entry)
   - Query result caching inherited from Superset
   - Metadata caching with refresh control
   
   **Cache Housekeeping Consideration:**
   
   Form data cache entries for chart previews inherit Superset's default 7-day 
TTL (`EXPLORE_FORM_DATA_CACHE_CONFIG`). For exploratory LLM workflows that may 
generate many preview iterations without saving, consider configuring a shorter 
TTL for MCP-created entries to avoid cache bloat:
   
   ```python
   # Example MCP-specific configuration (optional)
   MCP_EXPLORE_FORM_DATA_TTL = int(timedelta(hours=1).total_seconds())
   ```
   
   This is not currently implemented but may be beneficial for deployments with 
high exploratory workload volumes.
   
   ### 7. Observability and Audit
   
   **Principle**: Comprehensive logging and audit trails for enterprise 
deployment.
   
   **Implementation**:
   
   - Structured logging with request correlation IDs
   - Security event audit logging
   - Performance metrics and monitoring hooks
   - Error sanitization for security and user experience
   
   These principles ensure the MCP service provides a secure, extensible, and 
maintainable
   
   ## Implementation Details
   
   ### Data Access Layer
   
   The MCP service uses superset's DAO and Command implementations *out of the 
box*. To achive this we previously added 
(https://github.com/apache/superset/pull/35018) a couple of missing functions 
to remove the dependency on Flask-AppBuilder that does these for the CRUD APIs:
   
   ```jsx
   # superset/daos/base.py - Enhanced with new functions
   class BaseDAO(Generic[T]):
       @classmethod
       def list(
           cls,
           column_operators: Optional[List[ColumnOperator]] = None,
           order_column: str = "changed_on",
           order_direction: str = "desc",
           page: int = 0,
           page_size: int = 100,
           search: Optional[str] = None,
           search_columns: Optional[List[str]] = None,
           custom_filters: Optional[Dict[str, Any]] = None,
           columns: Optional[List[str]] = None,
       ) -> Tuple[List[Any], int]:
           """Generic list method for filtered, sorted, and paginated results"""
   
       @classmethod
       def count(cls) -> int:
           """Count total records - required for instance statistics"""
   ```
   
   ### Pydantic Schema Framework
   
   All MCP tool schemas use Pydantic for FastMCP compatibility and future 
FastAPI integration:
   
   ```jsx
   # Chart creation with iterative, discriminated union approach
   class ChartCreationRequest(BaseModel):
       dataset_id: int
       chart_type: ChartTypeEnum
       metrics: List[str]
       groupby: Optional[List[str]] = None
       filters: Optional[List[FilterConfig]] = None
       # Simplified schema - complex configurations added iteratively
   ```
   
   **Why Pydantic over Marshmallow:**
   
   - **FastMCP requirement**: FastMCP is Pydantic-first with Rust-based 
performance optimizations
   - **Type-hint based**: Declarative style using native Python type hints
   - **Automatic schema generation**: Pydantic models generate JSON schemas for 
MCP tool discovery (this only works properly in Claude Code at the moment - 
without it the llm is just guessing the input types)
   
   **Note**: This Pydantic adoption is isolated to the MCP service and does not 
require Superset-wide migration from Marshmallow.
   
   ### Chart Workflow Design
   
   The MCP service implements a **preview-first workflow** optimized for LLM 
conversations rather than traditional API patterns. This design principle is 
detailed in the [Preview-First User 
Experience](https://www.notion.so/SIP-171-i-Draft-Proposal-for-Model-Context-Protocol-MCP-Service-Revised-Architecture-2718718b6465800d80a3ef61ccee2a8e?pvs=21)
 architecture principle.
   
   **Key Design Difference:**
   
   - **Traditional APIs**: Immediate persistence with precise requirements
   - **MCP Service**: Exploration mode by default, explicit persistence when 
satisfied
   
   This workflow prevents database clutter from exploratory LLM conversations 
while maintaining flexibility for users to iterate on chart designs before 
committing them.
   
   ### MCP Service Internal Architecture
   
   ```mermaid
   flowchart TD
       subgraph "FastMCP Service"
           Server[FastMCP Server<br/>JSON-RPC 2.0 Protocol]
           Middleware[Extensible Middleware<br/>Auth, Rate Limiting, Logging]
       end
   
       subgraph "MCP Tool Domains"
           ChartTools[Chart Tools<br/>generate_chart, update_chart]
           DashTools[Dashboard Tools<br/>generate_dashboard]
           DataTools[Dataset Tools<br/>list_datasets, get_dataset_info]
           SQLTools[SQL Lab Tools<br/>execute_sql]
           SysTools[System Tools<br/>get_instance_info]
       end
   
       subgraph "Integration with Superset Core"
           Commands[Commands<br/>Business Logic & Validation]
           DAOs[DAOs<br/>Data Access Layer]
           Models[SQLAlchemy Models<br/>& RBAC Security]
       end
   
       Server --> Middleware
       Middleware --> ChartTools
       Middleware --> DashTools
       Middleware --> DataTools
       Middleware --> SQLTools
       Middleware --> SysTools
   
       ChartTools --> Commands
       DashTools --> Commands
       DataTools --> DAOs
       SQLTools --> DAOs
       SysTools --> DAOs
   
       Commands --> DAOs
       DAOs --> Models
   
       classDef service fill:#e1f5fe
       classDef tools fill:#f3e5f5
       classDef core fill:#e8f5e8
   
       class Server,Middleware service
       class ChartTools,DashTools,DataTools,SQLTools,SysTools tools
       class Commands,DAOs,Models core
   ```
   
   **Architecture Layers:**
   
   1. **FastMCP Framework**: Handles MCP protocol, request routing, and 
JSON-RPC 2.0 communication
   2. **Middleware Stack**: Provides cross-cutting concerns (auth, rate 
limiting, caching, audit)
   3. **Tool Domains**: Organized by functional area with type-safe Pydantic 
schemas
   4. **Integration Layer**: Enhanced DAOs, Commands, and comprehensive 
validation pipeline
   5. **Superset Core**: Reuses existing models, security, and database 
infrastructure
   
   ### MCP Core Abstraction Pattern
   
   The MCP service implements a generic core abstraction layer that eliminates 
code duplication and provides consistent patterns for common operations:
   
   ```mermaid
   flowchart TD
       subgraph "Generic Core Classes"
           ListCore[ModelListCore<br/>Pagination, filtering, search]
           InfoCore[ModelGetInfoCore<br/>Retrieve by ID/UUID]
           ChartCore[ChartToolCore<br/>Chart creation & updates<br/>To be 
implemented]
           DashboardCore[DashboardToolCore<br/>Dashboard management<br/>To be 
implemented]
       end
   
       subgraph "Tool Implementations"
           ListTools[List Tools<br/>list_datasets, list_charts, etc.]
           InfoTools[Info Tools<br/>get_dataset_info, get_chart_info, etc.]
           CustomTools[Custom Tools<br/>generate_chart, update_chart, 
etc.<br/>Currently using custom implementation]
       end
   
       subgraph "Superset Components"
           Components[DAOs, Commands, Schemas]
       end
   
       ListCore -.->|Instantiate with config| ListTools
       InfoCore -.->|Instantiate with config| InfoTools
       ChartCore -.-x|Future pattern| CustomTools
       DashboardCore -.-x|Future pattern| CustomTools
   
       ListTools --> Components
       InfoTools --> Components
       CustomTools --> Components
   
       classDef coreClass fill:#e8f5e8
       classDef toolClass fill:#e1f5fe
       classDef componentClass fill:#f3e5f5
       classDef plannedClass fill:#e8f5e8,stroke-dasharray: 5 5
   
       class ListCore,InfoCore coreClass
       class ChartCore,DashboardCore plannedClass
       class ListTools,InfoTools,CustomTools toolClass
       class Components componentClass
   ```
   
   **Core Abstraction Benefits:**
   
   - **Code Reuse**: `ModelListCore` handles pagination, filtering, search for 
all list operations
   - **Consistency**: Identical behavior patterns across dataset, chart, and 
dashboard listing
   - **Type Safety**: Generic type parameters ensure compile-time correctness
   - **Extensibility**: New models can leverage existing cores with minimal 
configuration
   - **Maintainability**: Bug fixes and improvements benefit all tools using 
the core
   
   **Example Core Usage Pattern:**
   
   ```python
   # list_datasets.py - Minimal tool implementation
   tool = ModelListCore(
       dao_class=DatasetDAO,
       output_schema=DatasetInfo,
       item_serializer=serialize_dataset_object,
       filter_type=DatasetFilter,
       default_columns=DEFAULT_DATASET_COLUMNS,
       search_columns=["schema", "sql", "table_name"],
       list_field_name="datasets",
       output_list_schema=DatasetList,
   )
   return tool.run_tool(**params)
   ```
   
   **Future Refactoring Opportunities:**
   
   - Complex tools (chart/dashboard generation) could benefit from similar 
abstraction patterns
   - Validation pipeline could be extracted into reusable core classes
   - Preview generation logic could become a shared core component
   
   ## Security Model
   
   The MCP service reuses Superset's existing security infrastructure:
   
   - **Authentication**: JWT tokens, session-based auth (development)
   - **Authorization**: Flask-AppBuilder RBAC automatically applied through DAOs
   - **Input Validation**: 5-layer pipeline with Pydantic schemas and security 
sanitization
   - **Data Access**: Row-level and column-level security inherited from 
Superset core
   
   ### Permission Model
   
   **No Additional Permission Grants Required**: The MCP service does not 
request additional permissions from users. All permissions are directly derived 
from the user's existing Superset RBAC roles and permissions.
   
   **How it Works:**
   
   1. User authenticates to MCP service (JWT token or session)
   2. User's identity is established in Flask's `g.user` context
   3. All MCP tool operations enforce the user's existing Superset permissions
   4. If a user lacks permission in the web UI, they lack it in MCP tools
   
   **Example:**
   
   - User has "Viewer" role in Superset → Can list/view charts via MCP, cannot 
create/modify
   - User has "Editor" role → Can create/modify charts via MCP
   - User has row-level security filters → Same filters apply to MCP query 
results
   
   This design ensures users cannot bypass Superset's permission model by using 
MCP tools instead of the web UI.
   
   ### Permission Enforcement
   
   When a user attempts an operation without proper permissions, the MCP 
service follows Superset's standard error handling:
   
   1. **Command Layer Validation**: Commands (e.g., `CreateChartCommand`) call 
`security_manager.is_owner()` and `security_manager.can_access_datasource()`
   2. **Exception Raised**: Permission failures raise specific exceptions:
       - `ChartAccessDeniedError`: "You don't have access to this chart"
       - `ChartForbiddenError`: "Changing this chart is forbidden"
       - `DashboardsForbiddenError`: "Changing one or more of these dashboards 
is forbidden"
   3. **MCP Tool Response**: The tool catches these exceptions and returns 
structured error responses to the LLM
   
   **Example Permission Flow:**
   
   ```python
   # User tries to create chart without dataset access
   try:
       CreateChartCommand(data).run()
   except ChartAccessDeniedError as ex:
       return {
           "success": False,
           "error": {
               "type": "permission_denied",
               "message": "You don't have access to this dataset"
           }
       }
   ```
   
   This ensures:
   
   - **Consistent security**: Same permission checks as the web UI
   - **Clear error messages**: LLMs receive actionable feedback
   - **No privilege escalation**: Users cannot bypass permissions through MCP 
tools
   
   ## Deployment Architecture
   
   ### Updated Deployment Topology
   
   Building on the original SIP-171 modular architecture concept, the revised 
implementation follows this deployment pattern:
   
   ```mermaid
   flowchart LR
       subgraph "Superset Ecosystem"
           Core[superset-core<br/>Models, DAOs, Commands]
   
           subgraph "Service Deployments"
               WebApp[superset-web<br/>Flask Application<br/>Port 8088]
               MCP[superset-mcp<br/>FastMCP Service<br/>Port 5008]
           end
   
           subgraph "Extensions"
               Ext[superset-extensions<br/>Custom Tools & Middleware]
           end
       end
   
       subgraph "External Clients"
           Browser[Web Browser<br/>Dashboard UI]
           LLM[LLM Clients<br/>Claude, GPT]
           API[REST API Clients]
       end
   
       subgraph "Data Layer"
           MetaDB[(Metadata Database)]
           DataSources[(External Databases)]
       end
   
       Core --> WebApp
       Core --> MCP
       Core --> Ext
   
       Ext -.->|Plugin Tools| MCP
   
       Browser --> WebApp
       LLM --> MCP
       API --> WebApp
   
       WebApp --> MetaDB
       MCP --> MetaDB
       WebApp --> DataSources
       MCP --> DataSources
   
       classDef core fill:#f3e5f5
       classDef services fill:#e1f5fe
       classDef extensions fill:#e8f5e8
       classDef clients fill:#fff3e0
       classDef data fill:#fce4ec
   
       class Core core
       class WebApp,MCP services
       class Ext extensions
       class Browser,LLM,API clients
       class MetaDB,DataSources data
   
   ```
   
   **Key Deployment Characteristics:**
   
   - **Modular Services**: Web application and MCP service can be deployed 
independently
   - **Shared Core**: Both services use the same superset-core library for 
consistency
   - **Independent Scaling**: MCP service can be scaled separately based on AI 
workload demands
   - **Extension Support**: Plugins can add tools to both web and MCP services
   - **Unified Data Access**: Both services access the same metadata and data 
sources
   
   ### Development Instructions
   
   ### Option 1: Standalone Service (Dev)
   
   `superset mcp run --port 5008 --debug`
   
   ### Option 2: Docker Compose Integration
   
   ```yaml
   services:
     superset-web:
       build: .
       command: ["superset", "run", "-h", "0.0.0.0", "-p", "8088", 
"--with-threads", "--reload", "--debugger"]
       ports:
         - "8088:8088"
       environment:
         - SUPERSET_CONFIG_PATH=/app/superset_config.py
   
     superset-mcp:
       build: .
       command: ["superset", "mcp", "run", "--port", "5008"]
       ports:
         - "5008:5008"
       environment:
         - SUPERSET_CONFIG_PATH=/app/superset_config.py
       depends_on:
         - superset-web
   
   ```
   
   ### Option 3: Production Deployment with Load Balancing
   
   ```yaml
   services:
     superset-web:
       image: superset:latest
       deploy:
         replicas: 3
       ports:
         - "8088:8088"
   
     superset-mcp:
       image: superset:latest
       command: ["superset", "mcp", "run", "--workers", "4"]
       deploy:
         replicas: 2
       ports:
         - "5008:5008"
       environment:
         - MCP_ENABLE_AUTH=true
         - MCP_RATE_LIMIT=1000
   
   ```
   
   ### New or Changed Public Interfaces
   
   ### MCP Tools Interface
   
   The initial MCP service exposes an example set of tools through MCP - the 
list will definitely grow as the project matures:
   
   **Discovery Tools:**
   
   - `list_dashboards(filters?, pagination?) → {dashboards: Dashboard[], 
pagination: PaginationInfo}`
   - `list_charts(filters?, pagination?) → {charts: Chart[], pagination: 
PaginationInfo}`
   - `list_datasets(filters?, pagination?) → {datasets: Dataset[], pagination: 
PaginationInfo}`
   
   **Chart Management Tools:**
   
   - `generate_chart(dataset_id, chart_type, config) → {explore_url: string, 
preview_url?: string}`
   - `update_chart(chart_id, updates) → {chart: Chart, explore_url: string}`
   - `get_chart_preview(chart_id) → {preview_url: string, data_preview: object}`
   
   **Dashboard Management Tools:**
   
   - `generate_dashboard(title, charts_layout) → {dashboard: Dashboard, 
edit_url: string}`
   - `add_chart_to_existing_dashboard(dashboard_id, chart_id, position) → 
{success: boolean}`
   
   **SQL and Analysis Tools:**
   
   - `execute_sql(database_id, sql_query) → {results: QueryResult, 
execution_info: object}`
   - `generate_explore_link(dataset_id, config) → {explore_url: string}`
   
   ### Enhanced DAO Interface
   
   ```python
   def list(...) -> Tuple[List[Any], int]  # Replaces Flask-AppBuilder list 
functionality
   def count() -> int                      # Required for instance statistics
   def get_filterable_columns_and_operators() -> Dict[str, Any]  # Filter 
discovery
   ```
   
   ### New Dependencies
   
   #### Python Dependencies
   
   - **fastmcp** (v2.10.0+): Core MCP protocol implementation
       - License: MIT
       - Maintenance: Active (Anthropic-backed)
       - Purpose: MCP server implementation and tooling
       - Compatible with: Pydantic v2, FastAPI ecosystem
   - **pydantic** (v2.0+): Schema validation and serialization - already added 
as part of "dao updates PR"
       - License: MIT
       - Maintenance: Very active, widely adopted
       - Purpose: Type-safe API schemas, FastAPI compatibility
       - Note: Superset already uses Pydantic in some areas
   
   #### Optional Dependencies
   
   - **playwright** (for chart screenshots): Already planned for SIP-178
   - **Pillow** (for image processing): Chart preview functionality
   
   #### Breaking Changes
   
   **Note**: Some current compatibility issues between werkzeug 3.1.3 and 
fastmcp 2.12.3. The solution has been to use fastmcp 2.10.0 for now, until the 
incompatibility is resolved.
   
   ### Migration Plan and Compatibility
   
   #### Implementation Phases
   
   The POC implementation is largely complete. The plan is to merge it in 
phases through multiple pull requests:
   
   **Phase 0: Update Base DAO** [DONE]
   
   - Enhanced `BaseDAO` with `list()` and `count()` methods
   
   **Phase 1: Base Scaffold for MCP Service**
   
   - Ability to run the service with `superset mcp run`
   - Simple health tool
   - Dev support locally and in docker
   - Support `fastmcp` `http` and `stdio`
   
   **Phase 2: Core Listing and Discovery Tools**
   
   - `list_dashboards`, `list_charts`, `list_datasets`
   - `get_dashboard_info`, `get_chart_info`, `get_dataset_info`
   - `get_superset_instance_info`
   
   **Phase 3: Chart Creation and Preview**
   
   - `generate_chart` - Preview-first creation with explore links
   - `update_chart` - Granular chart modifications
   - `get_chart_preview` - Visual chart previews
   - `execute_sql` - SQL Lab integration
   - `generate_explore_link` - Pre-configured exploration URLs
   
   **Phase 4: Dashboard Management**
   
   - `generate_dashboard` - Row/column grid layout approach
   - `add_chart_to_existing_dashboard` - Chart placement tools
   
   **Phase 5: Advanced Features** (Future)
   
   - Prompt testing framework to ensure tools are used correctly by LLMs
   
   #### Database Schema
   
   **No database migrations required.** The MCP service uses existing Superset 
tables and models through the enhanced DAO layer.
   
   #### Configuration Migration
   
   New installations include MCP service configuration in default config 
templates. Existing installations remain unaffected unless explicitly enabled.
   
   #### Backward Compatibility
   
   - Existing Superset APIs and functionality unchanged
   - MCP service operates independently of web application
   - Feature flag ensures zero impact when disabled
   - Enhanced DAO methods are additive, not breaking
   
   #### URL Structure
   
   MCP service URLs are completely separate from existing Superset URLs:
   
   - Superset: http://localhost:8088/*
   - MCP Service: http://localhost:5008/* (or via MCP client connections)
   
   ### Rejected Alternatives
   
   #### 1. ASGI Integration with Flask (Original SIP-171)
   
   **Rejected because:**
   
   - Flask-AppBuilder's request lifecycle dependencies create complexity
   - WSGI-to-ASGI wrapping introduces performance overhead
   - Tight coupling between web and MCP service layers
   - Difficult to scale and deploy independently
   
   #### 2. REST API Extensions
   
   **Rejected because:**
   
   - AI agents benefit from higher-level, workflow-oriented tools
   - JSON-RPC 2.0 provides better function-call semantics for LLMs
   - REST APIs require too many granular calls for complex workflows
   - MCP provides standardized protocol for AI tool integration
   
   #### 3. External Bridge Service
   
   **Rejected because:**
   
   - Adds unnecessary architectural complexity
   - Duplicates authentication and authorization logic
   - Requires maintaining separate codebase
   - Loses access to Superset's internal optimizations
   
   #### 4. Direct Flask-SQLAlchemy Integration
   
   **Rejected because:**
   
   - Flask-AppBuilder abstractions don't align with MCP tool patterns
   - Request context requirements complicate standalone service deployment
   - Dependency injection patterns conflict with FastMCP architecture
   - Testing becomes more complex with web framework dependencies
   
   ## Implementation Status and Next Steps
   
   This SIP represents the formalization of an architectural approach validated 
through proof-of-concept development. The current implementation demonstrates:
   
   1. **Successful FastMCP integration** with Superset models
   2. **Enhanced BaseDAO methods** for pagination and filtering
   3. **Preview-first chart creation** workflows
   4. **Pydantic schema compatibility** with FastMCP tooling
   5. **Independent service deployment** options
   
   ### Immediate Next Steps
   
   1. Community review and feedback on architectural decisions
   2. Finalization of Pydantic schema designs for chart creation
   3. Implementation of dashboard management tools
   4. Production-ready middleware for authentication and rate limiting
   5. [Comprehensive testing framework for MCP 
tools](https://www.notion.so/MCP-Testing-Validation-Framework-2778718b6465807c863dcf1f8ce292c7?pvs=21)
   
   ### Success Metrics
   
   - LLM agents can successfully create and modify charts through natural 
language
   - Chart creation workflows require fewer iterations than existing UI 
approaches
   - Service can handle concurrent AI agent sessions without performance 
degradation
   - Integration provides measurable productivity improvements for data 
analysis workflows
   
   This revised SIP-171 proposal represents an evolution of the original 
concept, informed by practical implementation experience and optimized for the 
realities of AI-driven analytics workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [SIP] Proposal for Model Context Protocol (MCP) Service - Revised Architecture [superset]

Reply via email to