[
https://issues.apache.org/jira/browse/SPARK-52806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vara Bonthu updated SPARK-52806:
--------------------------------
Description:
This SPIP proposes adding AI-native observability capabilities to Apache Spark
through a Model Context Protocol (MCP) server that enables natural language
querying and analysis of Spark History Server data.
h2. Summary
We propose creating a bridge between AI assistants (Claude, GPT, Amazon Q) and
Apache Spark History Server data, enabling users to ask questions like "Why is
my Spark job slow?" and receive AI-powered analysis with actionable
recommendations.
h2. Key Features
* Natural language interface for Spark diagnostics
* 17+ pre-built diagnostic tools for common performance scenarios
* AI-powered root cause analysis and optimization recommendations
* Zero modifications required to existing Spark installations
* Compatible with multiple AI assistants via Model Context Protocol
h2. Community Value
* 10x faster troubleshooting workflows
* Lower barrier to entry for Spark performance optimization
* Positions Apache Spark as AI-ready for next-generation observability
* Addresses growing demand for AI-powered developer tools
h2. Implementation Approach
* Standalone MCP server consuming existing Spark History Server REST APIs
* No changes to Spark core required
* Kubernetes-native deployment with Helm charts or on any virtual machine
* Built on the emerging MCP standard for AI-tool integration
h2. Related Work
* No related projects are available for this problem
* This project is currently under a neutral org
[https://github.com/DeepDiagnostix-AI/spark-history-server-mcp]
h2. Who maintains
- Currently, Vara Bonthu (AWS Open Source Specialist SA), Manabu McCloskey
(AWS, Open Source Engineer), along with [AWS Data
Processing|https://aws.amazon.com/sagemaker/data-processing/] team.
We have also submitted a proposal to Kubeflow
[https://github.com/kubeflow/community/issues/872.
|https://github.com/kubeflow/community/issues/872] We want to hear from Apache
Spark community on this amazing step forward for AI observability and are
willing to support this project.
Full SPIP document with detailed technical design, timeline, and success
metrics will be attached as a comment.
This proposal aligns with Apache Spark's mission to make big data processing
accessible while positioning the project at the forefront of AI-native tooling.
*NOTE: We are happy to demo this to the community a great solution if you
provide the opportunity for us to present.*
was:
This SPIP proposes adding AI-native observability capabilities to Apache Spark
through a Model Context Protocol (MCP) server that enables natural language
querying and analysis of Spark History Server data.
h2. Summary
We propose creating a bridge between AI assistants (Claude, GPT, Amazon Q) and
Apache Spark History Server data, enabling users to ask questions like "Why is
my Spark job slow?" and receive AI-powered analysis with actionable
recommendations.
h2. Key Features
* Natural language interface for Spark diagnostics
* 17+ pre-built diagnostic tools for common performance scenarios
* AI-powered root cause analysis and optimization recommendations
* Zero modifications required to existing Spark installations
* Compatible with multiple AI assistants via Model Context Protocol
h2. Community Value
* 10x faster troubleshooting workflows
* Lower barrier to entry for Spark performance optimization
* Positions Apache Spark as AI-ready for next-generation observability
* Addresses growing demand for AI-powered developer tools
h2. Implementation Approach
* Standalone MCP server consuming existing Spark History Server REST APIs
* No changes to Spark core required
* Kubernetes-native deployment with Helm charts or on any virtual machine
* Built on the emerging MCP standard for AI-tool integration
h2. Related Work
* No related projects are available for this problem
* This project is currently under a neutral org
[https://github.com/DeepDiagnostix-AI/spark-history-server-mcp]
h2. Who maintains
- Currently, Vara Bonthu (AWS Open Source Specialist SA), Manabu McCloskey
(AWS, Open Source Engineer), along with Amazon EMR service teams until we build
the community.
We have also submitted a proposal to Kubeflow
[https://github.com/kubeflow/community/issues/872.
|https://github.com/kubeflow/community/issues/872] We want to hear from Apache
Spark community on this amazing step forward for AI observability and are
willing to support this project.
Full SPIP document with detailed technical design, timeline, and success
metrics will be attached as a comment.
This proposal aligns with Apache Spark's mission to make big data processing
accessible while positioning the project at the forefront of AI-native tooling.
*NOTE: We are happy to demo this to the community a great solution if you
provide the opportunity for us to present.*
> SPIP: AI-Native Observability for Apache Spark History Server via Model
> Context Protocol
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-52806
> URL: https://issues.apache.org/jira/browse/SPARK-52806
> Project: Spark
> Issue Type: New Feature
> Components: Documentation, Web UI
> Affects Versions: 3.5.6, 4.0.0
> Environment: This solution works with any Spark History server
> deployment, irrespective of the cloud provider
> Reporter: Vara Bonthu
> Priority: Major
> Labels: AI, MCP, SPIP, historyserver, observability
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> This SPIP proposes adding AI-native observability capabilities to Apache
> Spark through a Model Context Protocol (MCP) server that enables natural
> language querying and analysis of Spark History Server data.
> h2. Summary
> We propose creating a bridge between AI assistants (Claude, GPT, Amazon Q)
> and Apache Spark History Server data, enabling users to ask questions like
> "Why is my Spark job slow?" and receive AI-powered analysis with actionable
> recommendations.
> h2. Key Features
> * Natural language interface for Spark diagnostics
> * 17+ pre-built diagnostic tools for common performance scenarios
> * AI-powered root cause analysis and optimization recommendations
> * Zero modifications required to existing Spark installations
> * Compatible with multiple AI assistants via Model Context Protocol
> h2. Community Value
> * 10x faster troubleshooting workflows
> * Lower barrier to entry for Spark performance optimization
> * Positions Apache Spark as AI-ready for next-generation observability
> * Addresses growing demand for AI-powered developer tools
> h2. Implementation Approach
> * Standalone MCP server consuming existing Spark History Server REST APIs
> * No changes to Spark core required
> * Kubernetes-native deployment with Helm charts or on any virtual machine
> * Built on the emerging MCP standard for AI-tool integration
> h2. Related Work
> * No related projects are available for this problem
> * This project is currently under a neutral org
> [https://github.com/DeepDiagnostix-AI/spark-history-server-mcp]
> h2. Who maintains
> - Currently, Vara Bonthu (AWS Open Source Specialist SA), Manabu McCloskey
> (AWS, Open Source Engineer), along with [AWS Data
> Processing|https://aws.amazon.com/sagemaker/data-processing/] team.
>
> We have also submitted a proposal to Kubeflow
> [https://github.com/kubeflow/community/issues/872.
> |https://github.com/kubeflow/community/issues/872] We want to hear from
> Apache Spark community on this amazing step forward for AI observability and
> are willing to support this project.
>
> Full SPIP document with detailed technical design, timeline, and success
> metrics will be attached as a comment.
> This proposal aligns with Apache Spark's mission to make big data processing
> accessible while positioning the project at the forefront of AI-native
> tooling.
>
> *NOTE: We are happy to demo this to the community a great solution if you
> provide the opportunity for us to present.*
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]