Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

L. C. Hsieh Mon, 13 Nov 2023 23:50:10 -0800

Thanks for all the support from the community for the SPIP proposal.

Since all questions/discussion are settled down (if I didn't miss any
major ones), if no more questions or concerns, I'll be the shepherd
for this SPIP proposal and call for a vote tomorrow.


Thank you all!

On Mon, Nov 13, 2023 at 6:43 PM Zhou Jiang <zhou.c.ji...@gmail.com> wrote:
>
> Hi Holden,
>
> Thanks a lot for your feedback!
> Yes, this proposal attempts to integrate existing solutions, especially from 
> CRD perspective. The proposed schema retains similarity with current designs, 
> while reducing duplicates and maintaining a single source of truth from conf 
> properties. It also tends to be close to native integration with k8s to 
> minimize schema changes for new features.
> For dependencies, packing everything is the easiest way to get started. It 
> would be straightforward to add --packages and --repositories support for 
> Maven dependencies. It's technically possible to pull dependencies in cloud 
> storage from init containers (if defined by user). It could be tricky to 
> design a general solution that supports different cloud providers from the 
> operator layer. An enhancement that I can think of is to add support for 
> profile scripts that can enable additional user-defined actions in 
> application containers.
> Operator does not have to build everything for k8s version compatibility. 
> Similar to Spark, operator can be built on Fabric8 
> client(https://github.com/fabric8io/kubernetes-client) for support across 
> versions, given that it makes similar API calls for resource management as 
> Spark. For tests, in addition to fabric8 mock server, we may also borrow the 
> idea from Flink operator to start minikube cluster for integration tests.
> This operator is not starting from scratch as it is derived from an internal 
> project which has been working in prod scale for a few years. It aims to 
> include a few new features / enhancements, and a few re-architecture mostly 
> to incorporate lessons learnt for designing CRD / API perspective.
> Benchmarking operator performance alone can be nuanced, often tied to the 
> underlying cluster. There's a testing strategy that Aaruna & I discussed in a 
> previous Data AI summit, involves scheduling wide (massive light-weight 
> applications) and deep (single application request a lot of executors with 
> heavy IO) cases, revealing typical bottlenecks at the k8s API server and 
> scheduler performance.Similar tests can be performed for this as well.
>
> On Sun, Nov 12, 2023 at 4:32 PM Holden Karau <hol...@pigscanfly.ca> wrote:
>>
>> To be clear: I am generally supportive of the idea (+1) but have some 
>> follow-up questions:
>>
>> Have we taken the time to learn from the other operators? Do we have a 
>> compatible CRD/API or not (and if so why?)
>> The API seems to assume that everything is packaged in the container in 
>> advance, but I imagine that might not be the case for many folks who have 
>> Java or Python packages published to cloud storage and they want to use?
>> What's our plan for the testing on the potential version explosion (not 
>> tying ourselves to operator version -> spark version makes a lot of sense, 
>> but how do we reasonably assure ourselves that the cross product of Operator 
>> Version, Kube Version, and Spark Version all function)? Do we have CI 
>> resources for this?
>> Is there a current (non-open source operator) that folks from Apple are 
>> using and planning to open source, or is this a fresh "from the ground up" 
>> operator proposal?
>> One of the key reasons for this is listed as "An out-of-the-box automation 
>> solution that scales effectively" but I don't see any discussion of the 
>> target scale or plans to achieve it?
>>
>>
>>
>> On Thu, Nov 9, 2023 at 9:02 PM Zhou Jiang <zhou.c.ji...@gmail.com> wrote:
>>>
>>> Hi Spark community,
>>>
>>> I'm reaching out to initiate a conversation about the possibility of 
>>> developing a Java-based Kubernetes operator for Apache Spark. Following the 
>>> operator pattern 
>>> (https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark 
>>> users may manage applications and related components seamlessly using 
>>> native tools like kubectl. The primary goal is to simplify the Spark user 
>>> experience on Kubernetes, minimizing the learning curve and operational 
>>> complexities and therefore enable users to focus on the Spark application 
>>> development.
>>>
>>> Although there are several open-source Spark on Kubernetes operators 
>>> available, none of them are officially integrated into the Apache Spark 
>>> project. As a result, these operators may lack active support and 
>>> development for new features. Within this proposal, our aim is to introduce 
>>> a Java-based Spark operator as an integral component of the Apache Spark 
>>> project. This solution has been employed internally at Apple for multiple 
>>> years, operating millions of executors in real production environments. The 
>>> use of Java in this solution is intended to accommodate a wider user and 
>>> contributor audience, especially those who are familiar with Scala.
>>>
>>> Ideally, this operator should have its dedicated repository, similar to 
>>> Spark Connect Golang or Spark Docker, allowing it to maintain a loose 
>>> connection with the Spark release cycle. This model is also followed by the 
>>> Apache Flink Kubernetes operator.
>>>
>>> We believe that this project holds the potential to evolve into a thriving 
>>> community project over the long run. A comparison can be drawn with the 
>>> Flink Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes 
>>> operator, making it a part of the Apache Flink project 
>>> (https://github.com/apache/flink-kubernetes-operator). This move has gained 
>>> wide industry adoption and contributions from the community. In a mere 
>>> year, the Flink operator has garnered more than 600 stars and has attracted 
>>> contributions from over 80 contributors. This showcases the level of 
>>> community interest and collaborative momentum that can be achieved in 
>>> similar scenarios.
>>>
>>> More details can be found at SPIP doc : Spark Kubernetes Operator 
>>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>>>
>>> Thanks,
>>>
>>> --
>>> Zhou JIANG
>>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
>
> --
> Zhou JIANG
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

Reply via email to