FrankChen021 opened a new pull request, #17731: URL: https://github.com/apache/druid/pull/17731
There are several problems in the Dockerfile ### 1. Extreme slow building on Apple Silicon Chips Previously, to allow building docker on Apple Silicon Chips like M1, the docker file forces the building under the amd64 platform. This is to address the building problem that node-sass does not support ARM, see https://github.com/apache/druid/issues/13012 ``` FROM --platform=linux/amd64 maven:3.9 as builder ``` However, this drastically slows down the docker building process on these platforms, like it takes more than **15** minutes to build an image on my M1 laptop. The main reason is that Apple has to use x86 emulator to run the building process. ### 2. Unfriendly to debug Currently the distroless base image is used, it's a secure image but it's unfriendly to debug. there's no curl, no wget, no lsof, and nettools. It's painful to debug if we have to debug some live issues. And there're some other problems which are described in the following section. ### Changes Description 1. The entire building process is split into two stages, the web-console build stage which runs under amd64 platform, and the distribution building stage which adapts local development platform. And during the distribution building stage, the web-console will be copied for final distribution package. This improves the building process drastically. Now on my laptop, it takes 120 seconds to complete the web-console building stage, and 210 seconds to complete the backend service building stage which are acceptable. ``` => [web-console-builder 4/4] RUN --mount=type=cache,target=/root/.m2 if [ "true" = "true" ]; then cd /src/web-console && mvn -B -ff -DskipUTs clean package; fi 126.4s => [builder 4/7] WORKDIR /src 0.0s => [builder 5/7] COPY --from=web-console-builder /src/web-console/target/web-console*.jar /src/web-console/target/ 0.0s => [builder 6/7] RUN --mount=type=cache,target=/root/.m2 if [ "true" = "true" ]; then mvn -B -ff clean install -Pdist,bundle-contrib-exts -Pskip 211.5s ``` 2. Unifed the JDK during building and final run environment Previously, the `maven:3.9`, which comes with JDK17, is used for building stage. This does NOT respect the `JDK_VERSION` argument in the docker file. This means if we're going to build druid in 21 by specifying the JDK_VERSION, the distribution was still buit under JDK17 but packaged to run in JRE 21 environment. In this PR, this is fixed. The buliding stage and final image use the SAME version of JDK 3. Switching base from `gcr.io/distroless/java$JDK_VERSION-debian12` to `alpine` This also drastically simplifies the docker file. Previously, we have to install busybox, download bash from somewhere in the Dockerfile, which makes the Dockerfile very complicated. Since alpine comes with shell, these steps are eliminated. The change does NOT involve size bloat of image. On my local it shows that size of alpine based image is 746MB which is a little bit smaller than that of distroless image. ``` druid latest 6eb4ec6dc77f 34 minutes ago 746MB druid distroless 1daa75c32b0c 7 hours ago 761MB ``` And some command used tools like curl,lsof,netools are packaged in the final docker image. 4. Remove the evaluation of VERSION Previously we use the following command to evaluate the version, but this step takes VERY LONG time on my laptop ``` RUN --mount=type=cache,target=/root/.m2 VERSION=$(mvn -B -q org.apache.maven.plugins:maven-help-plugin:3.2.0:evaluate \ -Dexpression=project.version -DforceStdout=true \ ) \ ... ``` We can see that after 254 seconds, the command is still running. ``` => [builder 7/8] RUN VERSION=$(mvn -B -q org.apache.maven.plugins:maven-help-plugin:3.2.0:evaluate -Dstyle.color=never -Dexpression=project.version -DforceStdout= 254.3s ``` This is eliminated because by applying 'clean' to the maven command, we ensure that there's only one tar file under the distribution and we can use wild match to find the file and decompress it 5. test-related modules are execluded from distribution stage. 6. `druid.sh` is also updated to ensure `druid.host` has value before starting java process. This helps exposing problem more earlier. #### Release note The default image is switched from `gcr.io/distroless/java17-debian12` to `alpine` This PR has: - [X] been self-reviewed. - [X] a release note entry in the PR description. - [X] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [X] been tested in a test Druid cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
