Thank you, Hyukjin.

The maintenance overhead only occurs when we add a new release.

And, we can prevent accidental upstream changes by avoiding 'latest' tags.

The overhead will be much smaller than our exisitng Dockerfile maintenance
(e.g. 'spark-rm')

Also, if we have a docker repository, we can publish 'spark-rm' image
together as a tool. This will save the time and efforts of release managers
a lot.

Bests,
Dongjoon

On Mon, Feb 10, 2020 at 00:25 Hyukjin Kwon <gurwls...@gmail.com> wrote:

> Quick question. Roughly how much overhead is it required to maintain
> minimal version?
> If that looks not too much, I think it's fine to give a shot.
>
>
> 2020년 2월 8일 (토) 오전 6:51, Dongjoon Hyun <dongjoon.h...@gmail.com>님이 작성:
>
>> Thank you, Sean, Jiaxin, Shane, and Tom, for feedbacks.
>>
>> 1. For legal questions, please see the following three Apache-approved
>> approaches. We can follow one of them.
>>
>>        1. https://hub.docker.com/u/apache (93 repositories,
>> Airflow/NiFi/Beam/Druid/Zeppelin/Hadoop/...)
>>        2. https://hub.docker.com/_/solr (This is also official. There
>> are more instances like this.)
>>        3. https://hub.docker.com/u/apachestreampipes (Some projects
>> tries this form.)
>>
>> 2. For non-Spark dev-environment images, definitely it will help both our
>> Jenkins and GitHub Action jobs. Apache Infra team also supports GitHub
>> Action secret like the following.
>>
>>        https://issues.apache.org/jira/browse/INFRA-19565 Create a Docker
>> Hub secret for Github Actions
>>
>> 3. For Spark image content questions, we should not do the following.
>> It's because not only for legal issues, but also we cannot contain or
>> maintain all popular libraries like Nvidia library/TensorFlow in our image.
>>
>>        https://issues.apache.org/jira/browse/SPARK-26398 Support
>> building GPU docker images
>>
>> 4. The way I see this is a minimal legal image containing only our
>> artifacts from the followings. We can check the other Apache repos's best
>> practice.
>>
>>        https://www.apache.org/dist/spark/
>>
>> 5. For OS/Java/Python/R runtimes and libraries, those (except OS) can
>> be overlayed as an additional layers by the users in general. I don't think
>> we need to provide every combination (Debian/Ubuntu/CentOS/Alpine) x
>> (JDK/JRE) x (Python2/Python3/PyPy) x (R 3.6/3.6) x (many libraries).
>> Specifically, I don't think we need to install all libraries like `arrow`.
>>
>> 6. For the target users, this is a general docker image. We don't need to
>> assume that this is for K8s-only environment. This can be used in any
>> Docker environment.
>>
>> 7. For the number of images, as suggested in this thread, we may want to
>> follow our existing K8s integration test suite way by splitting PySpark and
>> R images from Java. But, I don't have any requirement for this.
>>
>> What I want to propose in this thread is that we can start with a minimal
>> viable product and evolve them (if needed) as an open source community.
>>
>> Bests,
>> Dongjoon.
>>
>> PS. BTW, Apache Spark 2.4.5 artifacts are published into our doc website,
>> our distribution repo, Maven Central, PyPi, CRAN, Homebrew.
>>        I'm preparing website news and download page update.
>>
>>
>> On Thu, Feb 6, 2020 at 11:19 AM Tom Graves <tgraves...@yahoo.com> wrote:
>>
>>> When discussions of docker have occurred in the past - mostly related to
>>> k8s - there is a lot of discussion about what is the right image to
>>> publish, as well as making sure Apache is ok with it. Apache official
>>> release is the source code so we may need to make sure to have disclaimer
>>> and we need to make sure it doesn't contain anything licensed that it
>>> shouldn't.  What happens when one of the docker images we publish has
>>> security update. We would need to make sure all the legal bases are covered
>>> first.
>>>
>>> Then the discussion comes into what is in the docker images and how
>>> useful it is. People run different os's, different python versions, etc.
>>> And like Sean mentioned how useful really is it other then a few examples.
>>> Some discussions on https://issues.apache.org/jira/browse/SPARK-24655
>>>
>>> Tom
>>>
>>>
>>>
>>> On Wednesday, February 5, 2020, 02:16:37 PM CST, Dongjoon Hyun <
>>> dongjoon.h...@gmail.com> wrote:
>>>
>>>
>>> Hi, All.
>>>
>>> From 2020, shall we have an official Docker image repository as an
>>> additional distribution channel?
>>>
>>> I'm considering the following images.
>>>
>>>     - Public binary release (no snapshot image)
>>>     - Public non-Spark base image (OS + R + Python)
>>>       (This can be used in GitHub Action Jobs and Jenkins K8s
>>> Integration Tests to speed up jobs and to have more stabler environments)
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>

Reply via email to