What licensing issues come into play? On Thu, Dec 14, 2017 at 4:00 PM, Erik Erlandson <eerla...@redhat.com> wrote:
> We've been discussing the topic of container images a bit more. The > kubernetes back-end operates by executing some specific CMD and ENTRYPOINT > logic, which is different than mesos, and which is probably not practical > to unify at this level. > > However: These CMD and ENTRYPOINT configurations are essentially just a > thin skin on top of an image which is just an install of a spark distro. > We feel that a single "spark-base" image should be publishable, that is > consumable by kube-spark images, and mesos-spark images, and likely any > other community image whose primary purpose is running spark components. > The kube-specific dockerfiles would be written "FROM spark-base" and just > add the small command and entrypoint layers. Likewise, the mesos images > could add any specialization layers that are necessary on top of the > "spark-base" image. > > Does this factorization sound reasonable to others? > Cheers, > Erik > > > On Wed, Nov 29, 2017 at 10:04 AM, Mridul Muralidharan <mri...@gmail.com> > wrote: > >> We do support running on Apache Mesos via docker images - so this >> would not be restricted to k8s. >> But unlike mesos support, which has other modes of running, I believe >> k8s support more heavily depends on availability of docker images. >> >> >> Regards, >> Mridul >> >> >> On Wed, Nov 29, 2017 at 8:56 AM, Sean Owen <so...@cloudera.com> wrote: >> > Would it be logical to provide Docker-based distributions of other >> pieces of >> > Spark? or is this specific to K8S? >> > The problem is we wouldn't generally also provide a distribution of >> Spark >> > for the reasons you give, because if that, then why not RPMs and so on. >> > >> > On Wed, Nov 29, 2017 at 10:41 AM Anirudh Ramanathan < >> ramanath...@google.com> >> > wrote: >> >> >> >> In this context, I think the docker images are similar to the binaries >> >> rather than an extension. >> >> It's packaging the compiled distribution to save people the effort of >> >> building one themselves, akin to binaries or the python package. >> >> >> >> For reference, this is the base dockerfile for the main image that we >> >> intend to publish. It's not particularly complicated. >> >> The driver and executor images are based on said base image and only >> >> customize the CMD (any file/directory inclusions are extraneous and >> will be >> >> removed). >> >> >> >> Is there only one way to build it? That's a bit harder to reason about. >> >> The base image I'd argue is likely going to always be built that way. >> The >> >> driver and executor images, there may be cases where people want to >> >> customize it - (like putting all dependencies into it for example). >> >> In those cases, as long as our images are bare bones, they can use the >> >> spark-driver/spark-executor images we publish as the base, and build >> their >> >> customization as a layer on top of it. >> >> >> >> I think the composability of docker images, makes this a bit different >> >> from say - debian packages. >> >> We can publish canonical images that serve as both - a complete image >> for >> >> most Spark applications, as well as a stable substrate to build >> >> customization upon. >> >> >> >> On Wed, Nov 29, 2017 at 7:38 AM, Mark Hamstra <m...@clearstorydata.com >> > >> >> wrote: >> >>> >> >>> It's probably also worth considering whether there is only one, >> >>> well-defined, correct way to create such an image or whether this is a >> >>> reasonable avenue for customization. Part of why we don't do >> something like >> >>> maintain and publish canonical Debian packages for Spark is because >> >>> different organizations doing packaging and distribution of >> infrastructures >> >>> or operating systems can reasonably want to do this in a custom (or >> >>> non-customary) way. If there is really only one reasonable way to do a >> >>> docker image, then my bias starts to tend more toward the Spark PMC >> taking >> >>> on the responsibility to maintain and publish that image. If there is >> more >> >>> than one way to do it and publishing a particular image is more just a >> >>> convenience, then my bias tends more away from maintaining and >> publish it. >> >>> >> >>> On Wed, Nov 29, 2017 at 5:14 AM, Sean Owen <so...@cloudera.com> >> wrote: >> >>>> >> >>>> Source code is the primary release; compiled binary releases are >> >>>> conveniences that are also released. A docker image sounds fairly >> different >> >>>> though. To the extent it's the standard delivery mechanism for some >> artifact >> >>>> (think: pyspark on PyPI as well) that makes sense, but is that the >> >>>> situation? if it's more of an extension or alternate presentation of >> Spark >> >>>> components, that typically wouldn't be part of a Spark release. The >> ones the >> >>>> PMC takes responsibility for maintaining ought to be the core, >> critical >> >>>> means of distribution alone. >> >>>> >> >>>> On Wed, Nov 29, 2017 at 2:52 AM Anirudh Ramanathan >> >>>> <ramanath...@google.com.invalid> wrote: >> >>>>> >> >>>>> Hi all, >> >>>>> >> >>>>> We're all working towards the Kubernetes scheduler backend (full >> steam >> >>>>> ahead!) that's targeted towards Spark 2.3. One of the questions >> that comes >> >>>>> up often is docker images. >> >>>>> >> >>>>> While we're making available dockerfiles to allow people to create >> >>>>> their own docker images from source, ideally, we'd want to publish >> official >> >>>>> docker images as part of the release process. >> >>>>> >> >>>>> I understand that the ASF has procedure around this, and we would >> want >> >>>>> to get that started to help us get these artifacts published by >> 2.3. I'd >> >>>>> love to get a discussion around this started, and the thoughts of >> the >> >>>>> community regarding this. >> >>>>> >> >>>>> -- >> >>>>> Thanks, >> >>>>> Anirudh Ramanathan >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> Anirudh Ramanathan >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> >