This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 2b9c1fb83239 [SPARK-55327][K8S] Reduce Spark docker image sizes
2b9c1fb83239 is described below

commit 2b9c1fb8323961d40e0758952518e9b1054e3895
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Mon Feb 2 22:17:13 2026 -0800

    [SPARK-55327][K8S] Reduce Spark docker image sizes
    
    ### What changes were proposed in this pull request?
    
    This PR aims to reduce Spark docker image sizes. Note that SparkR binding 
Dockerfile is not touched in this file because it's in the deprecated status.
    
    ### Why are the changes needed?
    
    To optimize Spark docker image disk usage by roughly **20%** like the 
following example. I used Apache Spark 4.1.1 distribution with the built-in 
Dockerfiles (`BEFORE`) and this PR's Dockerfiles (`AFTER`) to compare.
    
    ```
    $ docker images apache/spark-py
    IMAGE                    ID             DISK USAGE   CONTENT SIZE
    apache/spark-py:BEFORE   d85ccb58ee9d       2.24GB          818MB
    apache/spark-py:AFTER    6ac9dd4544c2       1.83GB          712MB
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Pass the CIs and check the image size after building the images.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Yes (`Opus 4.5` on `Claude Code v2.1.5`)
    
    Closes #54107 from dongjoon-hyun/SPARK-55327.
    
    Authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .../kubernetes/docker/src/main/dockerfiles/spark/Dockerfile       | 5 +++--
 .../docker/src/main/dockerfiles/spark/bindings/python/Dockerfile  | 8 ++++----
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
index 0f970a3d63ad..5d15d1a5d358 100644
--- a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
+++ b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
@@ -36,7 +36,7 @@ ARG spark_uid=185
 RUN set -ex && \
     apt-get update && \
     ln -s /lib /lib64 && \
-    apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps 
net-tools logrotate libssl-dev && \
+    apt install -y --no-install-recommends bash tini libc6 libpam-modules 
krb5-user libnss3 procps net-tools logrotate libssl-dev && \
     mkdir -p /opt/spark && \
     mkdir -p /opt/spark/examples && \
     mkdir -p /opt/spark/work-dir && \
@@ -45,7 +45,8 @@ RUN set -ex && \
     ln -sv /bin/bash /bin/sh && \
     echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
     chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
-    rm -rf /var/cache/apt/* && rm -rf /var/lib/apt/lists/*
+    apt-get clean && \
+    rm -rf /var/cache/apt/* /var/lib/apt/lists/*
 
 COPY jars /opt/spark/jars
 # Copy RELEASE file if exists
diff --git 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile
 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile
index 14ee735d0333..16b36f897d03 100644
--- 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile
+++ 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile
@@ -30,10 +30,10 @@ USER 0
 
 RUN mkdir ${SPARK_HOME}/python
 RUN apt-get update && \
-    apt install -y python3 python3-pip && \
-    pip3 install --upgrade pip setuptools && \
-    # Removed the .cache to save space
-    rm -rf /root/.cache && rm -rf /var/cache/apt/* && rm -rf 
/var/lib/apt/lists/*
+    apt install -y --no-install-recommends python3 python3-pip && \
+    pip3 install --no-cache-dir --upgrade pip setuptools && \
+    apt-get clean && \
+    rm -rf /var/cache/apt/* /var/lib/apt/lists/*
 
 COPY python/pyspark ${SPARK_HOME}/python/pyspark
 COPY python/lib ${SPARK_HOME}/python/lib


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to