Thank you, Hyukjin. The maintenance overhead only occurs when we add a new release.
And, we can prevent accidental upstream changes by avoiding 'latest' tags. The overhead will be much smaller than our exisitng Dockerfile maintenance (e.g. 'spark-rm') Also, if we have a docker repository, we can publish 'spark-rm' image together as a tool. This will save the time and efforts of release managers a lot. Bests, Dongjoon On Mon, Feb 10, 2020 at 00:25 Hyukjin Kwon <gurwls...@gmail.com> wrote: > Quick question. Roughly how much overhead is it required to maintain > minimal version? > If that looks not too much, I think it's fine to give a shot. > > > 2020년 2월 8일 (토) 오전 6:51, Dongjoon Hyun <dongjoon.h...@gmail.com>님이 작성: > >> Thank you, Sean, Jiaxin, Shane, and Tom, for feedbacks. >> >> 1. For legal questions, please see the following three Apache-approved >> approaches. We can follow one of them. >> >> 1. https://hub.docker.com/u/apache (93 repositories, >> Airflow/NiFi/Beam/Druid/Zeppelin/Hadoop/...) >> 2. https://hub.docker.com/_/solr (This is also official. There >> are more instances like this.) >> 3. https://hub.docker.com/u/apachestreampipes (Some projects >> tries this form.) >> >> 2. For non-Spark dev-environment images, definitely it will help both our >> Jenkins and GitHub Action jobs. Apache Infra team also supports GitHub >> Action secret like the following. >> >> https://issues.apache.org/jira/browse/INFRA-19565 Create a Docker >> Hub secret for Github Actions >> >> 3. For Spark image content questions, we should not do the following. >> It's because not only for legal issues, but also we cannot contain or >> maintain all popular libraries like Nvidia library/TensorFlow in our image. >> >> https://issues.apache.org/jira/browse/SPARK-26398 Support >> building GPU docker images >> >> 4. The way I see this is a minimal legal image containing only our >> artifacts from the followings. We can check the other Apache repos's best >> practice. >> >> https://www.apache.org/dist/spark/ >> >> 5. For OS/Java/Python/R runtimes and libraries, those (except OS) can >> be overlayed as an additional layers by the users in general. I don't think >> we need to provide every combination (Debian/Ubuntu/CentOS/Alpine) x >> (JDK/JRE) x (Python2/Python3/PyPy) x (R 3.6/3.6) x (many libraries). >> Specifically, I don't think we need to install all libraries like `arrow`. >> >> 6. For the target users, this is a general docker image. We don't need to >> assume that this is for K8s-only environment. This can be used in any >> Docker environment. >> >> 7. For the number of images, as suggested in this thread, we may want to >> follow our existing K8s integration test suite way by splitting PySpark and >> R images from Java. But, I don't have any requirement for this. >> >> What I want to propose in this thread is that we can start with a minimal >> viable product and evolve them (if needed) as an open source community. >> >> Bests, >> Dongjoon. >> >> PS. BTW, Apache Spark 2.4.5 artifacts are published into our doc website, >> our distribution repo, Maven Central, PyPi, CRAN, Homebrew. >> I'm preparing website news and download page update. >> >> >> On Thu, Feb 6, 2020 at 11:19 AM Tom Graves <tgraves...@yahoo.com> wrote: >> >>> When discussions of docker have occurred in the past - mostly related to >>> k8s - there is a lot of discussion about what is the right image to >>> publish, as well as making sure Apache is ok with it. Apache official >>> release is the source code so we may need to make sure to have disclaimer >>> and we need to make sure it doesn't contain anything licensed that it >>> shouldn't. What happens when one of the docker images we publish has >>> security update. We would need to make sure all the legal bases are covered >>> first. >>> >>> Then the discussion comes into what is in the docker images and how >>> useful it is. People run different os's, different python versions, etc. >>> And like Sean mentioned how useful really is it other then a few examples. >>> Some discussions on https://issues.apache.org/jira/browse/SPARK-24655 >>> >>> Tom >>> >>> >>> >>> On Wednesday, February 5, 2020, 02:16:37 PM CST, Dongjoon Hyun < >>> dongjoon.h...@gmail.com> wrote: >>> >>> >>> Hi, All. >>> >>> From 2020, shall we have an official Docker image repository as an >>> additional distribution channel? >>> >>> I'm considering the following images. >>> >>> - Public binary release (no snapshot image) >>> - Public non-Spark base image (OS + R + Python) >>> (This can be used in GitHub Action Jobs and Jenkins K8s >>> Integration Tests to speed up jobs and to have more stabler environments) >>> >>> Bests, >>> Dongjoon. >>> >>