FrankChen021 opened a new pull request, #17731:
URL: https://github.com/apache/druid/pull/17731

   There are several problems in the Dockerfile
   
   ### 1. Extreme slow building on Apple Silicon Chips
   
   Previously, to allow building docker on Apple Silicon Chips like M1, the 
docker file forces the building under the amd64 platform. This is to address 
the building problem that node-sass does not support ARM, see 
https://github.com/apache/druid/issues/13012
   
   ```
   FROM --platform=linux/amd64 maven:3.9 as builder
   ```
   
   However, this drastically slows down the docker building process on these 
platforms, like it takes more than **15** minutes to build an image on my M1 
laptop.
   
   The main reason is that Apple has to use x86 emulator to run the building 
process.
   
   ### 2. Unfriendly to debug
   
   Currently the distroless base image is used, it's a secure image but it's 
unfriendly to debug. there's no curl, no wget, no lsof, and nettools. It's 
painful to debug if we have to debug some live issues.
   
   And there're some other problems which are described in the following 
section.
   
   ### Changes Description
   
   1. The entire building process is split into two stages, the web-console 
build stage which runs under amd64 platform, and the distribution building 
stage which adapts local development platform. And during the distribution 
building stage, the web-console will be copied for final distribution package.
   
      This improves the building process drastically. Now on my laptop, it 
takes 120 seconds to complete the web-console building stage, and 210 seconds 
to complete the backend service building stage which are acceptable.
   
   ```
    => [web-console-builder 4/4] RUN --mount=type=cache,target=/root/.m2 if [ 
"true" = "true" ]; then     cd /src/web-console && mvn -B -ff -DskipUTs clean 
package; fi       126.4s
    => [builder 4/7] WORKDIR /src                                               
                                                                                
                0.0s
    => [builder 5/7] COPY --from=web-console-builder 
/src/web-console/target/web-console*.jar /src/web-console/target/               
                                           0.0s
    => [builder 6/7] RUN --mount=type=cache,target=/root/.m2 if [ "true" = 
"true" ]; then       mvn -B -ff       clean install       
-Pdist,bundle-contrib-exts       -Pskip  211.5s
   ```
   
   2. Unifed the JDK during building and final run environment
   
   Previously, the `maven:3.9`, which comes with JDK17, is used for building 
stage. This does NOT respect the `JDK_VERSION` argument in the docker file. 
This means if we're going to build druid in 21 by specifying the JDK_VERSION, 
the distribution was still buit under JDK17 but packaged to run in JRE 21 
environment.
   
   In this PR, this is fixed. The buliding stage and final image use the SAME 
version of JDK
   
   3. Switching base from `gcr.io/distroless/java$JDK_VERSION-debian12` to 
`alpine`
   
   This also drastically simplifies the docker file. Previously, we have to 
install busybox, download bash from somewhere in the Dockerfile, which makes 
the Dockerfile very complicated.
   
   Since alpine comes with shell, these steps are eliminated.  The change does 
NOT involve size bloat of image. On my local it shows that size of alpine based 
image is 746MB which is a little bit smaller than that of distroless image.
   
   ```
   druid                         latest                     6eb4ec6dc77f   34 
minutes ago   746MB
   druid                         distroless                 1daa75c32b0c   7 
hours ago      761MB
   ```
   
   And some command used tools like curl,lsof,netools are packaged in the final 
docker image.
   
   
   4. Remove the evaluation of VERSION
   
   Previously we use the following command to evaluate the version, but this 
step takes VERY LONG time on my laptop
   
   ```
   RUN --mount=type=cache,target=/root/.m2 VERSION=$(mvn -B -q 
org.apache.maven.plugins:maven-help-plugin:3.2.0:evaluate \
         -Dexpression=project.version -DforceStdout=true \
       ) \
   ...
   ```
   
   We can see that after 254 seconds, the command is still running.
   
   ```
    => [builder 7/8] RUN VERSION=$(mvn -B -q 
org.apache.maven.plugins:maven-help-plugin:3.2.0:evaluate       
-Dstyle.color=never -Dexpression=project.version -DforceStdout=  254.3s
   ```
   
   This is eliminated because by applying 'clean' to the maven command, we 
ensure that there's only one tar file under the distribution and we can use 
wild match to find the file and decompress it
   
   5. test-related modules are execluded from distribution stage.
   
   6. `druid.sh` is also updated to ensure `druid.host` has value before 
starting java process. This helps exposing problem more earlier.
   
   
   #### Release note
   
   The default image is switched from `gcr.io/distroless/java17-debian12` to 
`alpine`
   
   
   This PR has:
   
   - [X] been self-reviewed.
   - [X] a release note entry in the PR description.
   - [X] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [X] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to