[I] [SIP-???] Proposal for Model Context Protocol (MCP) Service [superset]

via GitHub Mon, 23 Jun 2025 11:01:00 -0700


mistercrunch opened a new issue, #33870:
URL: https://github.com/apache/superset/issues/33870


   ## Motivation
   
   MCP is an open JSON-RPC 2.0 spec that gives any LLM a **universal “USB-C 
port”** for tools, data, and actions — one schema, no custom glue. Anthropic 
open-sourced it in Nov 2024, and the rest of the stack stampeded in: OpenAI 
baked MCP into ChatGPT & the Agents SDK; Google DeepMind is wiring it into 
Gemini; Microsoft highlighted it during Build; dev–tool players like Replit, 
Zed, Sourcegraph, and Block already ship MCP 
servers.:contentReference[oaicite:0]{index=0}
   
   ### Why REST doesn’t cut it for agents
   
   - **Auth done wrong** – Users don’t want and shouldn’t share their API key 
with an LLM
   - **Too many calls** – REST is designed for apps, so it’s extremely verbose 
and “atomic”
   - **IDs vs. context** – agents don’t want `owner_id=42`, they want `{id:42, 
username:"jdoe", email:"…"}` right away.
   
   ### Superset + MCP → headless-BI 2.0
   
   Superset is already headless; MCP lets any agent pick up the steering wheel.
   
   | Agent ask | Superset does |
   | --- | --- |
   | “**Search** all dashboards on churn.” | `list_dashboards query="churn"` |
   | “**Create** a line chart of MRR vs time.” | `generate_chart metric="MRR" 
dim="date"` |
   | “**Open** an Explore on top 50 customers.” | `generate_explore_link 
dataset="customers" limit=50` |
   | “Why did revenue dip? **Drill** into root cause.” | 
`run_root_cause_analysis dashboard_id=123` |
   
   End-to-end, the LLM can pry data, build viz, and kick the user straight into 
SQL Lab or Explore, then hand control back. That’s AI-augmented, headless BI.
   
   ### 
   
   ### How MCP Relates to the RAG-Focused SIPs
   
   **Different directions, different problems — both useful.**
   
   | Dimension | RAG-centric SIPs | MCP SIP (this doc) |
   | --- | --- | --- |
   | **Call direction** | **Superset ➜ LLM**<br>Superset queries an external 
model for extra context or explanations. | **LLM ➜ Superset**<br>External 
agents call Superset to fetch assets or trigger actions. |
   | **Primary benefit** | Enriches the *user’s* experience *inside* Superset 
(semantic search, chart “explainers,” etc.). | Lets agents *outside* Superset 
automate everything users can do through the UI. |
   | **Auth model** | Superset authenticates **to** the model. | Agent 
authenticates **to** Superset, fully RBAC-aware. |
   | **Granularity** | Model returns unstructured answers. | Superset returns 
deterministic, typed objects and links. |
   | **Dependency** | Needs vector stores / LangChain wrappers inside Superset. 
| Needs an MCP blueprint exposed by Superset. |
   
   These tracks don’t depend on each other, and neither blocks the other:
   
   - **Ship RAG features** for smarter querying and insight *within* Superset.
   - **Ship MCP** so any LLM agent can treat Superset as just another tool in a 
multi-app workflow.
   
   Separate SIPs, separate code paths, complementary value. Feel free to pursue 
and ship either (or both) in any order.```
   
   ---
   
   ## Proposed Change
   
   | Aspect | Detail |
   | --- | --- |
   | **Blueprint** | Opt-in Flask blueprint surfaced by helper pkg 
**`fastmcp`** (MIT, ≤300 LOC). |
   | **Toggle** | `ENABLE_MCP_SERVICE = False` (default). |
   | **CLI** | `superset mcp run --port 5008`. |
   | **Namespace** | `/api/mcp/v1/*` (tag kept but less rigid than REST; see 
*Versioning*). |
   | **Runtime** | WSGI-Flask by default; ASGI wrapping possible via 
`asgiref.wsgi.WsgiToAsgi`. |
   | **Hooks** | `auth_hook`, `impersonate`, `audit_log`, `rate_limit` — no-ops 
in OSS, pluggable in Preset & enterprise. |
   
   ### Code Reuse / DRY Strategy
   
   - **Single source of truth**: Commands + DAOs encapsulate business rules. 
Similar REST / MCP endpoints compose the same set of commands + DAOs
   - MCP and REST **compose** those objects; no logic duplication.
   - Shared Marshmallow schemas reused directly or shallow-wrapped to add 
denormalised fields.
   
   ### High-Level vs. Atomic
   
   MCP tools are **chunkier**: one call, one meaningful action, denormalised 
payloads (e.g. `owners:[{id, username, email}]` as opposed to 
`owner_ids=[1,2,3]`)  to spare agents extra look-ups.
   
   As a general rule of thumb, we'll try design tools while aligning with 
"agent stories", the agent counterpart of "user stories". CRUD interface will 
be simplified with simpler, intuitive schemas, following some of the principles 
highlighted in https://www.jlowin.dev/blog/as-an-agent-the-new-user-story
   
   ### Versioning Philosophy
   
   LLMs parse tool schemas *in-session*. Non-destructive breaking tweaks 
(rename `owner_ids`→`owners`) don’t require heavy semver ceremony. We bump 
`/v{n}` only for removals or semantic flips.
   
   ### Initial Action Set (Phase 1)
   
   *Discovery* → `list_*` • *Navigation* → `generate_explore_link`, 
`open_sql_lab_with_context` • *Mutations* → `generate_chart`, 
`generate_dashboard`, `add_chart_to_existing_dashboard`
   
   ### Deliverables
   
   - Blueprint + flag + CLI
   - 3-5 tools with unit, integration, and perf smoke tests
   - Minimal OpenAPI spec + auto-generated TS/Python client
   - Error envelope `{ "error": { "code": "...", "message": "..." } }`
   - Demo notebook/script
   
   ## New / Changed Public Interfaces
   
   | Interface | Addition |
   | --- | --- |
   | REST | `/api/mcp/v1/*` + `/api/simple/v1/*` |
   | Config | `ENABLE_MCP_SERVICE` |
   | CLI | `superset mcp run` |
   | Python | Optional `import fastmcp` |
   
   
   ## Phasing / Roll-out Plan
   
   | Phase | Goal | Outcome |
   | --- | --- | --- |
   | **1 – Proof of Concept** | Skeleton + 3-5 tools | Live agent demo: list → 
chart → SQL Lab |
   | **2 – Coverage Expansion** | Broader tool library | > 80 % of daily 
actions scriptable |
   | **3 – Production Hardening** | Extract **`superset-core`**; add robust 
auth/impersonation/logging | GA under OIDC / Okta / Preset Cloud |
   
   
   ## Longer-Term Package Topology
   
   ```mermaid
   flowchart LR
       core[superset-core]
   
       superset-app --> core
       superset-rest --> core
       superset-ext --> core
       superset-mcp --> core
   
   ```
   
   ## Industry Context: Auth & Impersonation
   
   Token exchange vs. signed-JWT vs. OAuth device-flow is still shaking out. 
Phase 1 ships **hooks + tests**; adapters drop when a clear winner emerges.
   
   ---
   
   ## New Dependencies
   
   - **fastmcp** – internal helper, MIT, no external deps
   - **asgiref** – optional (Apache-2) for ASGI wrapping
   
   ## Migration Plan & Compatibility
   
   - Disabled by default → zero impact
   - No DB migrations
   - Future breaking changes gated behind `/v{n}` and announced on `dev@`
   
   ---
   
   ## Rejected Alternatives
   
   | Alternative | Why Not |
   | --- | --- |
   | **External REST bridge** (`superset-mcp` PoC) | Extra hop, latency, 
duplicated RBAC/validation, schema drift |
   | **Immediate full `superset-core` extraction** | Multi-month refactor; 
slows PoC. Scheduled for Phase 3 |
   
   Embedded MCP provides **speed now** and **maintainability later**, 
complementing RAG efforts and keeping Superset at the center of AI-driven 
analytics.
   
   ## Why Model Context Protocol (MCP) and Why Now?
   
   The **Model Context Protocol (MCP)** is an open standard that lets 
large-language-model agents call *tools*—high-level, domain-specific 
actions—over a simple, schema-declared interface. Think of it as **USB-C for 
AI**: one plug that works across copilots (Claude, GitHub Copilot, Cursor, 
etc.) and related services (Postgres, GitHub, Slack, MotherDuck, Superset). The 
spec was open-sourced by Anthropic in late 2024 and has since been adopted or 
trialed by Microsoft, Hex, MotherDuck, Zed, Replit, Sourcegraph, Block, and 
others.:contentReference[oaicite:0]{index=0}
   
   ### Why REST Isn’t Enough
   
   REST was built for machine-to-machine plumbing, not for autonomous agents:
   
   - **API-keys & secrets** Models shouldn’t see them. MCP sessions carry 
*scoped* credentials or use local sockets—no key leakage.
   - **Over-atomic verbs** Listing a dashboard, grabbing its metadata, then 
building a chart is 3-5 calls in REST but **one tool call** in MCP.
   - **Hyper-verbose schemas** REST spreads context across many endpoints; LLMs 
lose the thread. MCP bundles denormalised payloads that match the agent’s 
mental model.
   
   ### Superset + MCP ⇒ Headless, AI-Ready BI
   
   Superset already exposes a rich REST API, but agents must screenscrape or 
choreograph dozens of endpoints. Baking MCP into Superset means any copilot can 
treat Superset as a *first-class tool* in AI-driven workflows—perfect fit for 
our **headless BI** push.
   
   What an LLM can do once MCP is live:
   
   | Action | Example prompt the agent can satisfy |
   | --- | --- |
   | **Search assets** | “List dashboards tagged *revenue* created in the last 
30 days.” |
   | **Spin up viz** | “Create a bar chart showing ARR by region and add it to 
‘Q3 Exec Dashboard’.” |
   | **Jump to context** | “Open SQL Lab on the `raw_events` dataset with a 
sample query.” |
   | **Chat-in-context** | “Why did MRR drop in EMEA last month? Show a quick 
breakdown.” |
   | **Multi-app chains** | Combine Superset → dbt → MotherDuck in one agent 
plan for root-cause analysis. |
   
   In short, MCP lets **any LLM do (almost) everything a human can in the UI, 
then hand the wheel back**—unlocking AI-augmented analytics inside and around 
Superset without brittle glue code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [SIP-???] Proposal for Model Context Protocol (MCP) Service [superset]

Reply via email to