To Yikun, It seems that your reply (the following) didn't reach out to the mailing list correctly.
> Just FYI, we also had a discussion about tag policy (latest/3.4.0) and also rough size estimation [1] in "SPIP: Support Docker Official Image for Spark". > https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o/edit?disco=AAAAf2TyFr0 Let me add my opinion. IIUC, the whole content of SPIP (Support Docker Official Image for Spark) aims to add (1) newly, not to corrupt or destroy the existing (2). (1) https://hub.docker.com/_/spark (2) https://hub.docker.com/r/apache/spark/tags The reference model repos were also documented like the followings. https://hub.docker.com/_/flink https://hub.docker.com/_/storm https://hub.docker.com/_/solr https://hub.docker.com/_/zookeeper In short, according to the SPIP's `Docker Official Image` definition, new images should go to (1) only in order to achieve `Support Docker Official Image for Spark`, shouldn't they? Dongjoon. On Mon, May 8, 2023 at 6:22 PM Yikun Jiang <yikunk...@gmail.com> wrote: > > 1. The size regression: `apache/spark:3.4.0` tag which is claimed to be > a replacement of the existing `apache/spark:v3.4.0`. However, 3.4.0 is > 500MB while the original v3.4.0 is 405MB. 25% is huge in terms of the size. > > > 2. Accidental overwrite: `apache/spark:latest` was accidentally > overwritten by `apache/spark:python3` image which has a bigger size due to > the additional python binary. This is a breaking change to enforce the > downstream users to change to something like `apache/spark:scala`. > > Just FYI, we also had a discussion about tag policy (latest/3.4.0) and > also rough size estimation [1] in "SPIP: Support Docker Official Image for > Spark". > > [1] > https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o/edit?disco=AAAAf2TyFr0 > > Regards, > Yikun > > > On Tue, May 9, 2023 at 5:03 AM Dongjoon Hyun <dongj...@apache.org> wrote: > >> Thank you for initiating the discussion in the community. Yes, we need to >> give more context in the dev mailing list. >> >> This root cause is not about SPARK-40941 or SPARK-40513. Technically, >> this situation started 16 days ago due to SPARK-43148 because it made some >> breaking changes. >> >> https://github.com/apache/spark-docker/pull/33 >> SPARK-43148 Add Apache Spark 3.4.0 Dockerfiles >> >> 1. The size regression: `apache/spark:3.4.0` tag which is claimed to be a >> replacement of the existing `apache/spark:v3.4.0`. However, 3.4.0 is 500MB >> while the original v3.4.0 is 405MB. 25% is huge in terms of the size. >> >> 2. Accidental overwrite: `apache/spark:latest` was accidentally >> overwritten by `apache/spark:python3` image which has a bigger size due to >> the additional python binary. This is a breaking change to enforce the >> downstream users to change to something like `apache/spark:scala`. >> >> I believe (1) and (2) were our mistakes. We had better recover them ASAP. >> For Java questions, I prefer to be consistent with Apache Spark repo's >> default. >> >> Dongjoon. >> >> On 2023/05/08 08:56:26 Yikun Jiang wrote: >> > This is a call for discussion for how we can unified Apache Spark Docker >> > image tag fluently. >> > >> > As you might know, there is an apache/spark-docker >> > <https://github.com/apache/spark-docker> repo to store the dockerfiles >> and >> > help to publish the docker images, also intended to replace the original >> > manually publish workflow. >> > >> > The scope of new images is to cover previous image cases (K8s / docker >> run) >> > and also cover base image, standalone, Docker Official Image. >> > >> > - (Previous) apache/spark:v3.4.0, apache/spark-py:v3.4.0, >> > apache/spark-r:v3.4.0 >> > >> > * The image build from apache/spark spark on k8s dockerfiles >> > < >> https://github.com/apache/spark/tree/branch-3.4/resource-managers/kubernetes/docker/src/main/dockerfiles/spark >> > >> > >> > * Java version: Java 17 (It was Java 11 before v3.4.0, such as >> > v3.3.0/v3.3.1/v3.3.2), set Java 17 by default in SPARK-40941 >> > <https://github.com/apache/spark/pull/38417>. >> > >> > * Support: K8s / docker run >> > >> > * See also: Time to start publishing Spark Docker Images >> > <https://lists.apache.org/thread/h729bxrf1o803l4wz7g8bngkjd56y6x8> >> > >> > * Link: https://hub.docker.com/r/apache/spark-py, >> > https://hub.docker.com/r/apache/spark-r, >> > https://hub.docker.com/r/apache/spark >> > >> > - (New) apache/spark:3.4.0-python3(3.4.0/latest), apache/spark:3.4.0-r, >> > apache/spark:3.4.0-scala, and also a all in one image: >> > apache/spark:3.4.0-scala2.12-java11-python3-r-ubuntu >> > >> > * The image build from apache/spark-docker dockerfiles >> > <https://github.com/apache/spark-docker/tree/master/3.4.0> >> > >> > * Java version: Java 11, Java17 is supported by SPARK-40513 >> > <https://github.com/apache/spark-docker/pull/35> (under review) >> > >> > * Support: K8s / docker run / base image / standalone / Docker >> Official >> > Image >> > >> > * See detail in: Support Docker Official Image for Spark >> > <https://issues.apache.org/jira/browse/SPARK-40513> >> > >> > * About dropping prefix `v`: >> > https://github.com/docker-library/official-images/issues/14506 >> > >> > * Link: https://hub.docker.com/r/apache/spark >> > >> > We had some initial discuss on spark-website#458 >> > < >> https://github.com/apache/spark-website/pull/458#issuecomment-1522426236 >> >, >> > the mainly discussion is around version tag and default Java version >> > behavior changes, so we’d like to hear your idea in here about below >> > questions: >> > >> > *#1.Which Java version should be used by default (latest tag)? Java8 or >> > Java 11 or Java 17 or Any* >> > >> > *#2.Which tag should be used in apache/spark? v3.4.0 (with prefix v) or >> > 3.4.0 (dropping prefix v) or Both or Any* >> > >> > Starts with my prefer: >> > >> > 1. Java8 or Java17 are also ok to me (mainly considering the Java >> > maintenance cycle). BTW, other apache projects: flink (8/11, 11 as >> default >> > < >> https://github.com/docker-library/official-images/blob/93270eb07fb448fe7756b28af5495428242dcd6b/library/flink#L10 >> >), >> > solr (11 as default >> > < >> https://github.com/apache/solr-docker/blob/989825ee6dce2f6bf7b31051f1ba053b6c4426f2/8.11/Dockerfile#L4 >> > >> > for 8.x, 17 as default >> > < >> https://github.com/apache/solr-docker/blob/989825ee6dce2f6bf7b31051f1ba053b6c4426f2/9.2/Dockerfile#L17 >> > >> > since solr9), zookeeper (11 as default >> > < >> https://github.com/31z4/zookeeper-docker/blob/181e5862c85b517e4599d79eb5c2c7339e60a4aa/3.8.1/Dockerfile#L1 >> > >> > ) >> > >> > 2. Only 3.4.0 (dropping prefix v). It will help us transition to the new >> > tags with less confusion and also consider DOI suggestions >> > <https://github.com/docker-library/official-images/issues/14506>. >> > >> > Please feel free to share your ideas. >> > >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>