A few reasons to prefer init-containers come to mind:
Firstly, if we used spark-submit from within the driver container, the executors wouldn’t receive the jars on their class loader until after the executor starts because the executor has to launch first before localizing resources. It is certainly possible to make the class loader work with the user’s jars here, as is the case with all the client mode implementations, but, it seems cleaner to have the classpath include the user’s jars at executor launch time instead of needing to reason about the classloading order. We can also consider the idiomatic approach from the perspective of Kubernetes. Yinan touched on this already, but init-containers are traditionally meant to prepare the environment for the application that is to be run, which is exactly what we do here. This also makes it such that the localization process can be completely decoupled from the execution of the application itself. We can then for example detect the errors that happen on the resource localization layer, say when an HDFS cluster is down, before the application itself launches. The failure at the init-container stage is explicitly noted via the Kubernetes pod status API. Finally, running spark-submit from the container would make the SparkSubmit code inadvertently allow running client mode Kubernetes applications as well. We’re not quite ready to support that. Even if we were, it’s not entirely intuitive for the cluster mode code path to depend on the client mode code path. This isn’t entirely without precedent though, as Mesos has a similar dependency. Essentially the semantics seem neater and the contract is very explicit when using an init-container, even though the code does end up being more complex. From: Yinan Li <liyinan...@gmail.com> Date: Tuesday, January 9, 2018 at 7:16 PM To: Nicholas Chammas <nicholas.cham...@gmail.com> Cc: Anirudh Ramanathan <ramanath...@google.com.invalid>, Marcelo Vanzin <van...@cloudera.com>, Matt Cheah <mch...@palantir.com>, Kimoon Kim <kim...@pepperdata.com>, dev <dev@spark.apache.org> Subject: Re: Kubernetes: why use init containers? The init-container is required for use with the resource staging server (https://github.com/apache-spark-on-k8s/userdocs/blob/master/src/jekyll/running-on-kubernetes.md#resource-staging-server[github.com]). The resource staging server (RSS) is a spark-on-k8s component running in a Kubernetes cluster for staging submission client local dependencies to Spark pods. The init-container is responsible for downloading the dependencies from the RSS. We haven't upstream the RSS code yet, but this is a value add component for Spark on K8s as a way for users to use submission local dependencies without resorting to other mechanisms that are not immediately available on most Kubernetes clusters, e.g., HDFS. We do plan to upstream it in the 2.4 timeframe. Additionally, the init-container is a Kubernetes native way of making sure that the dependencies are localized before the main driver/executor containers are started. IMO, this guarantee is positive to have and it helps achieve separation of concerns. So IMO, I think the init-container is a valuable component and should be kept. On Tue, Jan 9, 2018 at 6:25 PM, Nicholas Chammas <nicholas.cham...@gmail.com> wrote: I’d like to point out the output of “git show —stat” for that diff: 29 files changed, 130 insertions(+), 1560 deletions(-) +1 for that and generally for the idea of leveraging spark-submit. You can argue that executors downloading from external servers would be faster than downloading from the driver, but I’m not sure I’d agree - it can go both ways. On a tangentially related note, one of the main reasons spark-ec2[github.com] is so slow to launch clusters is that it distributes files like the Spark binaries to all the workers via the master. Because of that, the launch time scaled with the number of workers requested[issues.apache.org]. When I wrote Flintrock[github.com], I got a large improvement in launch time over spark-ec2 simply by having all the workers download the installation files in parallel from an external host (typically S3 or an Apache mirror). And launch time became largely independent of the cluster size. That may or may not say anything about the driver distributing application files vs. having init containers do it in parallel, but I’d be curious to hear more. Nick On Tue, Jan 9, 2018 at 9:08 PM Anirudh Ramanathan <ramanath...@google.com.invalid> wrote: We were running a change in our fork which was similar to this at one point early on. My biggest concerns off the top of my head with this change would be localization performance with large numbers of executors, and what we lose in terms of separation of concerns. Init containers are a standard construct in k8s for resource localization. Also how this approach affects the HDFS work would be interesting. +matt +kimoon Still thinking about the potential trade offs here. Adding Matt and Kimoon who would remember more about our reasoning at the time. On Jan 9, 2018 5:22 PM, "Marcelo Vanzin" <van...@cloudera.com> wrote: Hello, Me again. I was playing some more with the kubernetes backend and the whole init container thing seemed unnecessary to me. Currently it's used to download remote jars and files, mount the volume into the driver / executor, and place those jars in the classpath / move the files to the working directory. This is all stuff that spark-submit already does without needing extra help. So I spent some time hacking stuff and removing the init container code, and launching the driver inside kubernetes using spark-submit (similar to how standalone and mesos cluster mode works): https://github.com/vanzin/spark/commit/k8s-no-init[github.com] I'd like to point out the output of "git show --stat" for that diff: 29 files changed, 130 insertions(+), 1560 deletions(-) You get massive code reuse by simply using spark-submit. The remote dependencies are downloaded in the driver, and the driver does the job of service them to executors. So I guess my question is: is there any advantage in using an init container? The current init container code can download stuff in parallel, but that's an easy improvement to make in spark-submit and that would benefit everybody. You can argue that executors downloading from external servers would be faster than downloading from the driver, but I'm not sure I'd agree - it can go both ways. Also the same idea could probably be applied to starting executors; Mesos starts executors using "spark-class" already, so doing that would both improve code sharing and potentially simplify some code in the k8s backend. -- Marcelo --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
smime.p7s
Description: S/MIME cryptographic signature