Thanks for all the support from the community for the SPIP proposal. Since all questions/discussion are settled down (if I didn't miss any major ones), if no more questions or concerns, I'll be the shepherd for this SPIP proposal and call for a vote tomorrow.
Thank you all! On Mon, Nov 13, 2023 at 6:43 PM Zhou Jiang <zhou.c.ji...@gmail.com> wrote: > > Hi Holden, > > Thanks a lot for your feedback! > Yes, this proposal attempts to integrate existing solutions, especially from > CRD perspective. The proposed schema retains similarity with current designs, > while reducing duplicates and maintaining a single source of truth from conf > properties. It also tends to be close to native integration with k8s to > minimize schema changes for new features. > For dependencies, packing everything is the easiest way to get started. It > would be straightforward to add --packages and --repositories support for > Maven dependencies. It's technically possible to pull dependencies in cloud > storage from init containers (if defined by user). It could be tricky to > design a general solution that supports different cloud providers from the > operator layer. An enhancement that I can think of is to add support for > profile scripts that can enable additional user-defined actions in > application containers. > Operator does not have to build everything for k8s version compatibility. > Similar to Spark, operator can be built on Fabric8 > client(https://github.com/fabric8io/kubernetes-client) for support across > versions, given that it makes similar API calls for resource management as > Spark. For tests, in addition to fabric8 mock server, we may also borrow the > idea from Flink operator to start minikube cluster for integration tests. > This operator is not starting from scratch as it is derived from an internal > project which has been working in prod scale for a few years. It aims to > include a few new features / enhancements, and a few re-architecture mostly > to incorporate lessons learnt for designing CRD / API perspective. > Benchmarking operator performance alone can be nuanced, often tied to the > underlying cluster. There's a testing strategy that Aaruna & I discussed in a > previous Data AI summit, involves scheduling wide (massive light-weight > applications) and deep (single application request a lot of executors with > heavy IO) cases, revealing typical bottlenecks at the k8s API server and > scheduler performance.Similar tests can be performed for this as well. > > On Sun, Nov 12, 2023 at 4:32 PM Holden Karau <hol...@pigscanfly.ca> wrote: >> >> To be clear: I am generally supportive of the idea (+1) but have some >> follow-up questions: >> >> Have we taken the time to learn from the other operators? Do we have a >> compatible CRD/API or not (and if so why?) >> The API seems to assume that everything is packaged in the container in >> advance, but I imagine that might not be the case for many folks who have >> Java or Python packages published to cloud storage and they want to use? >> What's our plan for the testing on the potential version explosion (not >> tying ourselves to operator version -> spark version makes a lot of sense, >> but how do we reasonably assure ourselves that the cross product of Operator >> Version, Kube Version, and Spark Version all function)? Do we have CI >> resources for this? >> Is there a current (non-open source operator) that folks from Apple are >> using and planning to open source, or is this a fresh "from the ground up" >> operator proposal? >> One of the key reasons for this is listed as "An out-of-the-box automation >> solution that scales effectively" but I don't see any discussion of the >> target scale or plans to achieve it? >> >> >> >> On Thu, Nov 9, 2023 at 9:02 PM Zhou Jiang <zhou.c.ji...@gmail.com> wrote: >>> >>> Hi Spark community, >>> >>> I'm reaching out to initiate a conversation about the possibility of >>> developing a Java-based Kubernetes operator for Apache Spark. Following the >>> operator pattern >>> (https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark >>> users may manage applications and related components seamlessly using >>> native tools like kubectl. The primary goal is to simplify the Spark user >>> experience on Kubernetes, minimizing the learning curve and operational >>> complexities and therefore enable users to focus on the Spark application >>> development. >>> >>> Although there are several open-source Spark on Kubernetes operators >>> available, none of them are officially integrated into the Apache Spark >>> project. As a result, these operators may lack active support and >>> development for new features. Within this proposal, our aim is to introduce >>> a Java-based Spark operator as an integral component of the Apache Spark >>> project. This solution has been employed internally at Apple for multiple >>> years, operating millions of executors in real production environments. The >>> use of Java in this solution is intended to accommodate a wider user and >>> contributor audience, especially those who are familiar with Scala. >>> >>> Ideally, this operator should have its dedicated repository, similar to >>> Spark Connect Golang or Spark Docker, allowing it to maintain a loose >>> connection with the Spark release cycle. This model is also followed by the >>> Apache Flink Kubernetes operator. >>> >>> We believe that this project holds the potential to evolve into a thriving >>> community project over the long run. A comparison can be drawn with the >>> Flink Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes >>> operator, making it a part of the Apache Flink project >>> (https://github.com/apache/flink-kubernetes-operator). This move has gained >>> wide industry adoption and contributions from the community. In a mere >>> year, the Flink operator has garnered more than 600 stars and has attracted >>> contributions from over 80 contributors. This showcases the level of >>> community interest and collaborative momentum that can be achieved in >>> similar scenarios. >>> >>> More details can be found at SPIP doc : Spark Kubernetes Operator >>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE >>> >>> Thanks, >>> >>> -- >>> Zhou JIANG >>> >> >> >> -- >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau > > > > -- > Zhou JIANG > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org