Quick question. Roughly how much overhead is it required to maintain minimal version? If that looks not too much, I think it's fine to give a shot.
2020년 2월 8일 (토) 오전 6:51, Dongjoon Hyun <dongjoon.h...@gmail.com>님이 작성: > Thank you, Sean, Jiaxin, Shane, and Tom, for feedbacks. > > 1. For legal questions, please see the following three Apache-approved > approaches. We can follow one of them. > > 1. https://hub.docker.com/u/apache (93 repositories, > Airflow/NiFi/Beam/Druid/Zeppelin/Hadoop/...) > 2. https://hub.docker.com/_/solr (This is also official. There are > more instances like this.) > 3. https://hub.docker.com/u/apachestreampipes (Some projects tries > this form.) > > 2. For non-Spark dev-environment images, definitely it will help both our > Jenkins and GitHub Action jobs. Apache Infra team also supports GitHub > Action secret like the following. > > https://issues.apache.org/jira/browse/INFRA-19565 Create a Docker > Hub secret for Github Actions > > 3. For Spark image content questions, we should not do the following. It's > because not only for legal issues, but also we cannot contain or maintain > all popular libraries like Nvidia library/TensorFlow in our image. > > https://issues.apache.org/jira/browse/SPARK-26398 Support building > GPU docker images > > 4. The way I see this is a minimal legal image containing only our > artifacts from the followings. We can check the other Apache repos's best > practice. > > https://www.apache.org/dist/spark/ > > 5. For OS/Java/Python/R runtimes and libraries, those (except OS) can > be overlayed as an additional layers by the users in general. I don't think > we need to provide every combination (Debian/Ubuntu/CentOS/Alpine) x > (JDK/JRE) x (Python2/Python3/PyPy) x (R 3.6/3.6) x (many libraries). > Specifically, I don't think we need to install all libraries like `arrow`. > > 6. For the target users, this is a general docker image. We don't need to > assume that this is for K8s-only environment. This can be used in any > Docker environment. > > 7. For the number of images, as suggested in this thread, we may want to > follow our existing K8s integration test suite way by splitting PySpark and > R images from Java. But, I don't have any requirement for this. > > What I want to propose in this thread is that we can start with a minimal > viable product and evolve them (if needed) as an open source community. > > Bests, > Dongjoon. > > PS. BTW, Apache Spark 2.4.5 artifacts are published into our doc website, > our distribution repo, Maven Central, PyPi, CRAN, Homebrew. > I'm preparing website news and download page update. > > > On Thu, Feb 6, 2020 at 11:19 AM Tom Graves <tgraves...@yahoo.com> wrote: > >> When discussions of docker have occurred in the past - mostly related to >> k8s - there is a lot of discussion about what is the right image to >> publish, as well as making sure Apache is ok with it. Apache official >> release is the source code so we may need to make sure to have disclaimer >> and we need to make sure it doesn't contain anything licensed that it >> shouldn't. What happens when one of the docker images we publish has >> security update. We would need to make sure all the legal bases are covered >> first. >> >> Then the discussion comes into what is in the docker images and how >> useful it is. People run different os's, different python versions, etc. >> And like Sean mentioned how useful really is it other then a few examples. >> Some discussions on https://issues.apache.org/jira/browse/SPARK-24655 >> >> Tom >> >> >> >> On Wednesday, February 5, 2020, 02:16:37 PM CST, Dongjoon Hyun < >> dongjoon.h...@gmail.com> wrote: >> >> >> Hi, All. >> >> From 2020, shall we have an official Docker image repository as an >> additional distribution channel? >> >> I'm considering the following images. >> >> - Public binary release (no snapshot image) >> - Public non-Spark base image (OS + R + Python) >> (This can be used in GitHub Action Jobs and Jenkins K8s Integration >> Tests to speed up jobs and to have more stabler environments) >> >> Bests, >> Dongjoon. >> >