andygrove commented on code in PR #675: URL: https://github.com/apache/datafusion-comet/pull/675#discussion_r1679801559
########## kube/Dockerfile: ########## @@ -0,0 +1,26 @@ +FROM apache/spark:3.4.2 + +USER root + +# Installing JDK11 as the image comes with JRE +RUN apt update \ + && apt install -y git \ + && apt install -y curl \ + && apt install -y openjdk-11-jdk \ + && apt clean + +RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y +ENV PATH="/root/.cargo/bin:${PATH}" +ENV RUSTFLAGS="-C debuginfo=line-tables-only -C incremental=false" +ENV SPARK_VERSION=3.4 +ENV SCALA_VERSION=2.12 + +# Pick the JDK instead of JRE to compile Comet +RUN cd /opt \ + && git clone https://github.com/apache/datafusion-comet.git \ + && cd datafusion-comet \ + && JAVA_HOME=$(readlink -f $(which javac) | sed "s/\/bin\/javac//") make release PROFILES="-Pspark-$SPARK_VERSION -Pscala-$SCALA_VERSION" + +RUN cp /opt/datafusion-comet/spark/target/comet-spark-spark${SPARK_VERSION}_$SCALA_VERSION-0.1.0-SNAPSHOT.jar $SPARK_HOME/jars + +USER ${spark_uid} Review Comment: We can reduce the size of the final image from ~8 GB to ~1 GB by using a multi stage build. With the Dockerfile in this PR: ``` REPOSITORY TAG IMAGE ID CREATED SIZE andygrove/comet latest 223fb8a90579 21 seconds ago 8.42GB ``` With the version suggested below: ``` REPOSITORY TAG IMAGE ID CREATED SIZE andygrove/comet latest 44695163de5d 31 seconds ago 996MB ``` ```suggestion FROM apache/spark:3.4.2 AS builder USER root # Installing JDK11 as the image comes with JRE RUN apt update \ && apt install -y git \ && apt install -y curl \ && apt install -y openjdk-11-jdk \ && apt clean RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y ENV PATH="/root/.cargo/bin:${PATH}" ENV RUSTFLAGS="-C debuginfo=line-tables-only -C incremental=false" ENV SPARK_VERSION=3.4 ENV SCALA_VERSION=2.12 # Pick the JDK instead of JRE to compile Comet RUN cd /opt \ && git clone https://github.com/apache/datafusion-comet.git \ && cd datafusion-comet \ && JAVA_HOME=$(readlink -f $(which javac) | sed "s/\/bin\/javac//") make release PROFILES="-Pspark-$SPARK_VERSION -Pscala-$SCALA_VERSION" FROM apache/spark:3.4.2 ENV SPARK_VERSION=3.4 ENV SCALA_VERSION=2.12 USER root COPY --from=builder /opt/datafusion-comet/spark/target/comet-spark-spark${SPARK_VERSION}_$SCALA_VERSION-0.1.0-SNAPSHOT.jar $SPARK_HOME/jars USER ${spark_uid} ``` Locally I am seeing this warning: ``` 1 warning found (use --debug to expand): - UndefinedVar: Usage of undefined variable '$spark_uid' (line 33) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org