andygrove commented on code in PR #675:
URL: https://github.com/apache/datafusion-comet/pull/675#discussion_r1679801559


##########
kube/Dockerfile:
##########
@@ -0,0 +1,26 @@
+FROM apache/spark:3.4.2
+
+USER root
+
+# Installing JDK11 as the image comes with JRE
+RUN apt update \
+    && apt install -y git \
+    && apt install -y curl \
+    && apt install -y openjdk-11-jdk \
+    && apt clean
+
+RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
+ENV PATH="/root/.cargo/bin:${PATH}"
+ENV RUSTFLAGS="-C debuginfo=line-tables-only -C incremental=false"
+ENV SPARK_VERSION=3.4
+ENV SCALA_VERSION=2.12
+
+# Pick the JDK instead of JRE to compile Comet
+RUN cd /opt \
+    && git clone https://github.com/apache/datafusion-comet.git \
+    && cd datafusion-comet \
+    && JAVA_HOME=$(readlink -f $(which javac) | sed "s/\/bin\/javac//") make 
release PROFILES="-Pspark-$SPARK_VERSION -Pscala-$SCALA_VERSION"
+
+RUN cp 
/opt/datafusion-comet/spark/target/comet-spark-spark${SPARK_VERSION}_$SCALA_VERSION-0.1.0-SNAPSHOT.jar
 $SPARK_HOME/jars
+
+USER ${spark_uid}

Review Comment:
   We can reduce the size of the final image from ~8 GB to ~1 GB by using a 
multi stage build.
   
   With the Dockerfile in this PR:
   
   ```
   REPOSITORY                         TAG       IMAGE ID       CREATED          
SIZE
   andygrove/comet                    latest    223fb8a90579   21 seconds ago   
8.42GB
   ```
   
   With the version suggested below:
   
   ```
   REPOSITORY                         TAG       IMAGE ID       CREATED          
SIZE
   andygrove/comet                    latest    44695163de5d   31 seconds ago   
996MB
   ```
   
   ```suggestion
   FROM apache/spark:3.4.2 AS builder
   
   USER root
   
   # Installing JDK11 as the image comes with JRE
   RUN apt update \
       && apt install -y git \
       && apt install -y curl \
       && apt install -y openjdk-11-jdk \
       && apt clean
   
   RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
   ENV PATH="/root/.cargo/bin:${PATH}"
   ENV RUSTFLAGS="-C debuginfo=line-tables-only -C incremental=false"
   ENV SPARK_VERSION=3.4
   ENV SCALA_VERSION=2.12
   
   # Pick the JDK instead of JRE to compile Comet
   RUN cd /opt \
       && git clone https://github.com/apache/datafusion-comet.git \
       && cd datafusion-comet \
       && JAVA_HOME=$(readlink -f $(which javac) | sed "s/\/bin\/javac//") make 
release PROFILES="-Pspark-$SPARK_VERSION -Pscala-$SCALA_VERSION"
   
   FROM apache/spark:3.4.2
   
   ENV SPARK_VERSION=3.4
   ENV SCALA_VERSION=2.12
   
   USER root
   
   COPY --from=builder  
/opt/datafusion-comet/spark/target/comet-spark-spark${SPARK_VERSION}_$SCALA_VERSION-0.1.0-SNAPSHOT.jar
 $SPARK_HOME/jars
   
   USER ${spark_uid}
   ```
   
   Locally I am seeing this warning:
   
   ```
    1 warning found (use --debug to expand):
    - UndefinedVar: Usage of undefined variable '$spark_uid' (line 33)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to