Unfortunately you'll need to chase down the license of all the bits that are distributed directly by the project. This was a big job back in the day for the Maven artifacts and some work to maintain. Most of the work is one-time, at least.
On Tue, Dec 19, 2017 at 12:53 PM Erik Erlandson <eerla...@redhat.com> wrote: > Agreed that the GPL family would be "toxic." > > The current images have been at least informally confirmed to use licenses > that are ASF compatible. Is there an officially sanctioned method of > license auditing that can be applied here? > > On Tue, Dec 19, 2017 at 11:45 AM, Sean Owen <so...@cloudera.com> wrote: > >> I think that's all correct, though the license of third party >> dependencies is actually a difficult and sticky part. The ASF couldn't make >> a software release including any GPL software for example, and it's not >> just a matter of adding a disclaimer. Any actual bits distributed by the >> PMC would have to follow all the license rules. >> >> On Tue, Dec 19, 2017 at 12:34 PM Erik Erlandson <eerla...@redhat.com> >> wrote: >> >>> I've been looking a bit more into ASF legal posture on licensing and >>> container images. What I have found indicates that ASF considers container >>> images to be just another variety of distribution channel. As such, it is >>> acceptable to publish official releases; for example an image such as >>> spark:v2.3.0 built from the v2.3.0 source is fine. It is not acceptable to >>> do something like regularly publish spark:latest built from the head of >>> master. >>> >>> More detail here: >>> https://issues.apache.org/jira/browse/LEGAL-270 >>> >>> So as I understand it, making a release-tagged public image as part of >>> each official release does not pose any problems. >>> >>> With respect to considering the licenses of other ancillary dependencies >>> that are also installed on such container images, I noticed this clause in >>> the legal boilerplate for the Flink images >>> <https://hub.docker.com/r/library/flink/>: >>> >>> As with all Docker images, these likely also contain other software >>>> which may be under other licenses (such as Bash, etc from the base >>>> distribution, along with any direct or indirect dependencies of the primary >>>> software being contained). >>>> >>> >>> So it may be sufficient to resolve this via disclaimer. >>> >>> -Erik >>> >>> On Thu, Dec 14, 2017 at 7:55 PM, Erik Erlandson <eerla...@redhat.com> >>> wrote: >>> >>>> Currently the containers are based off alpine, which pulls in BSD2 and >>>> MIT licensing: >>>> https://github.com/apache/spark/pull/19717#discussion_r154502824 >>>> >>>> to the best of my understanding, neither of those poses a problem. If >>>> we based the image off of centos I'd also expect the licensing of any image >>>> deps to be compatible. >>>> >>>> On Thu, Dec 14, 2017 at 7:19 PM, Mark Hamstra <m...@clearstorydata.com> >>>> wrote: >>>> >>>>> What licensing issues come into play? >>>>> >>>>> On Thu, Dec 14, 2017 at 4:00 PM, Erik Erlandson <eerla...@redhat.com> >>>>> wrote: >>>>> >>>>>> We've been discussing the topic of container images a bit more. The >>>>>> kubernetes back-end operates by executing some specific CMD and >>>>>> ENTRYPOINT >>>>>> logic, which is different than mesos, and which is probably not practical >>>>>> to unify at this level. >>>>>> >>>>>> However: These CMD and ENTRYPOINT configurations are essentially just >>>>>> a thin skin on top of an image which is just an install of a spark >>>>>> distro. >>>>>> We feel that a single "spark-base" image should be publishable, that is >>>>>> consumable by kube-spark images, and mesos-spark images, and likely any >>>>>> other community image whose primary purpose is running spark components. >>>>>> The kube-specific dockerfiles would be written "FROM spark-base" and just >>>>>> add the small command and entrypoint layers. Likewise, the mesos images >>>>>> could add any specialization layers that are necessary on top of the >>>>>> "spark-base" image. >>>>>> >>>>>> Does this factorization sound reasonable to others? >>>>>> Cheers, >>>>>> Erik >>>>>> >>>>>> >>>>>> On Wed, Nov 29, 2017 at 10:04 AM, Mridul Muralidharan < >>>>>> mri...@gmail.com> wrote: >>>>>> >>>>>>> We do support running on Apache Mesos via docker images - so this >>>>>>> would not be restricted to k8s. >>>>>>> But unlike mesos support, which has other modes of running, I believe >>>>>>> k8s support more heavily depends on availability of docker images. >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Mridul >>>>>>> >>>>>>> >>>>>>> On Wed, Nov 29, 2017 at 8:56 AM, Sean Owen <so...@cloudera.com> >>>>>>> wrote: >>>>>>> > Would it be logical to provide Docker-based distributions of other >>>>>>> pieces of >>>>>>> > Spark? or is this specific to K8S? >>>>>>> > The problem is we wouldn't generally also provide a distribution >>>>>>> of Spark >>>>>>> > for the reasons you give, because if that, then why not RPMs and >>>>>>> so on. >>>>>>> > >>>>>>> > On Wed, Nov 29, 2017 at 10:41 AM Anirudh Ramanathan < >>>>>>> ramanath...@google.com> >>>>>>> > wrote: >>>>>>> >> >>>>>>> >> In this context, I think the docker images are similar to the >>>>>>> binaries >>>>>>> >> rather than an extension. >>>>>>> >> It's packaging the compiled distribution to save people the >>>>>>> effort of >>>>>>> >> building one themselves, akin to binaries or the python package. >>>>>>> >> >>>>>>> >> For reference, this is the base dockerfile for the main image >>>>>>> that we >>>>>>> >> intend to publish. It's not particularly complicated. >>>>>>> >> The driver and executor images are based on said base image and >>>>>>> only >>>>>>> >> customize the CMD (any file/directory inclusions are extraneous >>>>>>> and will be >>>>>>> >> removed). >>>>>>> >> >>>>>>> >> Is there only one way to build it? That's a bit harder to reason >>>>>>> about. >>>>>>> >> The base image I'd argue is likely going to always be built that >>>>>>> way. The >>>>>>> >> driver and executor images, there may be cases where people want >>>>>>> to >>>>>>> >> customize it - (like putting all dependencies into it for >>>>>>> example). >>>>>>> >> In those cases, as long as our images are bare bones, they can >>>>>>> use the >>>>>>> >> spark-driver/spark-executor images we publish as the base, and >>>>>>> build their >>>>>>> >> customization as a layer on top of it. >>>>>>> >> >>>>>>> >> I think the composability of docker images, makes this a bit >>>>>>> different >>>>>>> >> from say - debian packages. >>>>>>> >> We can publish canonical images that serve as both - a complete >>>>>>> image for >>>>>>> >> most Spark applications, as well as a stable substrate to build >>>>>>> >> customization upon. >>>>>>> >> >>>>>>> >> On Wed, Nov 29, 2017 at 7:38 AM, Mark Hamstra < >>>>>>> m...@clearstorydata.com> >>>>>>> >> wrote: >>>>>>> >>> >>>>>>> >>> It's probably also worth considering whether there is only one, >>>>>>> >>> well-defined, correct way to create such an image or whether >>>>>>> this is a >>>>>>> >>> reasonable avenue for customization. Part of why we don't do >>>>>>> something like >>>>>>> >>> maintain and publish canonical Debian packages for Spark is >>>>>>> because >>>>>>> >>> different organizations doing packaging and distribution of >>>>>>> infrastructures >>>>>>> >>> or operating systems can reasonably want to do this in a custom >>>>>>> (or >>>>>>> >>> non-customary) way. If there is really only one reasonable way >>>>>>> to do a >>>>>>> >>> docker image, then my bias starts to tend more toward the Spark >>>>>>> PMC taking >>>>>>> >>> on the responsibility to maintain and publish that image. If >>>>>>> there is more >>>>>>> >>> than one way to do it and publishing a particular image is more >>>>>>> just a >>>>>>> >>> convenience, then my bias tends more away from maintaining and >>>>>>> publish it. >>>>>>> >>> >>>>>>> >>> On Wed, Nov 29, 2017 at 5:14 AM, Sean Owen <so...@cloudera.com> >>>>>>> wrote: >>>>>>> >>>> >>>>>>> >>>> Source code is the primary release; compiled binary releases are >>>>>>> >>>> conveniences that are also released. A docker image sounds >>>>>>> fairly different >>>>>>> >>>> though. To the extent it's the standard delivery mechanism for >>>>>>> some artifact >>>>>>> >>>> (think: pyspark on PyPI as well) that makes sense, but is that >>>>>>> the >>>>>>> >>>> situation? if it's more of an extension or alternate >>>>>>> presentation of Spark >>>>>>> >>>> components, that typically wouldn't be part of a Spark release. >>>>>>> The ones the >>>>>>> >>>> PMC takes responsibility for maintaining ought to be the core, >>>>>>> critical >>>>>>> >>>> means of distribution alone. >>>>>>> >>>> >>>>>>> >>>> On Wed, Nov 29, 2017 at 2:52 AM Anirudh Ramanathan >>>>>>> >>>> <ramanath...@google.com.invalid> wrote: >>>>>>> >>>>> >>>>>>> >>>>> Hi all, >>>>>>> >>>>> >>>>>>> >>>>> We're all working towards the Kubernetes scheduler backend >>>>>>> (full steam >>>>>>> >>>>> ahead!) that's targeted towards Spark 2.3. One of the >>>>>>> questions that comes >>>>>>> >>>>> up often is docker images. >>>>>>> >>>>> >>>>>>> >>>>> While we're making available dockerfiles to allow people to >>>>>>> create >>>>>>> >>>>> their own docker images from source, ideally, we'd want to >>>>>>> publish official >>>>>>> >>>>> docker images as part of the release process. >>>>>>> >>>>> >>>>>>> >>>>> I understand that the ASF has procedure around this, and we >>>>>>> would want >>>>>>> >>>>> to get that started to help us get these artifacts published >>>>>>> by 2.3. I'd >>>>>>> >>>>> love to get a discussion around this started, and the thoughts >>>>>>> of the >>>>>>> >>>>> community regarding this. >>>>>>> >>>>> >>>>>>> >>>>> -- >>>>>>> >>>>> Thanks, >>>>>>> >>>>> Anirudh Ramanathan >>>>>>> >>> >>>>>>> >>> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> -- >>>>>>> >> Anirudh Ramanathan >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >