This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
     new 0f76cd1  [SPARK-52542] Use `/nonexistent` instead of nonexistent 
`/opt/spark` (#87)
0f76cd1 is described below

commit 0f76cd1f98e924a07fb6a5551807015b634a92a2
Author: Dongjoon Hyun <dongj...@apache.org>
AuthorDate: Mon Jun 23 09:16:26 2025 -0700

    [SPARK-52542] Use `/nonexistent` instead of nonexistent `/opt/spark` (#87)
    
    ### What changes were proposed in this pull request?
    
    This PR aims to use `/nonexistent` explicitly instead of nonexistent 
`/home/spark` because the current status is misleading.
    
    Please note that SPARK-40528 introduced `useradd --system` which created 
`spark` user with a non-existent `/home/spark` directory from the beginning of 
this repository, `spark-docker`.
    
    - #12
    
      
https://github.com/apache/spark-docker/blob/c264d48dc510018095700ed33e700ccc34268bf2/Dockerfile.template#L21-L22
    
    **Rejected Alternatives**
    
    - We can set `HOME` to `/opt/spark` like Apache Spark behavior. However, 
it's also different from `WORKDIR` (`/opt/spark/work-dir`).
    - We can create `/home/spark`, but it could be more vulnerable than AS-IS 
status. For `system` account, `/nonexistent` is frequently used as the security 
practice to prevent any side effects of `HOME` directory.
    
    ```
    $ docker run -it --rm apache/spark:4.0.0 cat /etc/passwd | grep /nonexistent
    nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
    _apt:x:100:65534::/nonexistent:/usr/sbin/nologin
    ```
    
    ### Why are the changes needed?
    
    **Apache Spark 3.3.3**
    
    ```
    $ docker run -it --rm apache/spark:3.3.3 /opt/spark/bin/spark-sql
    ...
    25/06/20 20:15:41 WARN SparkSQLCLIDriver: WARNING: Directory for Hive 
history file: /home/spark does not exist.   History will not be available 
during this session.
    ```
    
    ```
    $ docker run -it --rm -uroot apache/spark:3.3.3 tail -1 /etc/passwd
    spark:x:185:185::/home/spark:/bin/sh
    
    $ docker run -it --rm -uroot apache/spark:3.3.3 ls -al /home/spark
    ls: cannot access '/home/spark': No such file or directory
    ```
    
    **Apache Spark 3.4.4**
    
    ```
    $ docker run -it --rm -uroot apache/spark:3.4.4 tail -1 /etc/passwd
    spark:x:185:185::/home/spark:/bin/sh
    
    $ docker run -it --rm -uroot apache/spark:3.4.4 ls -al /home/spark
    ls: cannot access '/home/spark': No such file or directory
    ```
    
    **Apache Spark 3.5.6**
    
    ```
    $ docker run -it --rm -uroot apache/spark:3.5.6 tail -1 /etc/passwd
    spark:x:185:185::/home/spark:/bin/sh
    
    $ docker run -it --rm -uroot apache/spark:3.5.6 ls /home/spark
    ls: cannot access '/home/spark': No such file or directory
    ```
    
    **Apache Spark 4.0.0**
    ```
    $ docker run -it --rm -uroot apache/spark:4.0.0 tail -1 /etc/passwd
    spark:x:185:185::/home/spark:/bin/sh
    
    $ docker run -it --rm -uroot apache/spark:4.0.0 ls /home/spark
    ls: cannot access '/home/spark': No such file or directory
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    No behavior change because it doesn't exist already.
    
    ### How was this patch tested?
    
    Manual review.
---
 4.0.0/scala2.13-java17-ubuntu/Dockerfile | 2 +-
 4.0.0/scala2.13-java21-ubuntu/Dockerfile | 2 +-
 Dockerfile.template                      | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/4.0.0/scala2.13-java17-ubuntu/Dockerfile 
b/4.0.0/scala2.13-java17-ubuntu/Dockerfile
index 031fc3e..0c84167 100644
--- a/4.0.0/scala2.13-java17-ubuntu/Dockerfile
+++ b/4.0.0/scala2.13-java17-ubuntu/Dockerfile
@@ -19,7 +19,7 @@ FROM eclipse-temurin:17-jammy
 ARG spark_uid=185
 
 RUN groupadd --system --gid=${spark_uid} spark && \
-    useradd --system --uid=${spark_uid} --gid=spark spark
+    useradd --system --uid=${spark_uid} --gid=spark -d /nonexistent spark
 
 RUN set -ex; \
     apt-get update; \
diff --git a/4.0.0/scala2.13-java21-ubuntu/Dockerfile 
b/4.0.0/scala2.13-java21-ubuntu/Dockerfile
index 15bd36b..b34f6e0 100644
--- a/4.0.0/scala2.13-java21-ubuntu/Dockerfile
+++ b/4.0.0/scala2.13-java21-ubuntu/Dockerfile
@@ -19,7 +19,7 @@ FROM eclipse-temurin:21-jammy
 ARG spark_uid=185
 
 RUN groupadd --system --gid=${spark_uid} spark && \
-    useradd --system --uid=${spark_uid} --gid=spark spark
+    useradd --system --uid=${spark_uid} --gid=spark -d /nonexistent spark
 
 RUN set -ex; \
     apt-get update; \
diff --git a/Dockerfile.template b/Dockerfile.template
index a410e06..ed07c88 100644
--- a/Dockerfile.template
+++ b/Dockerfile.template
@@ -19,7 +19,7 @@ FROM {{ BASE_IMAGE }}
 ARG spark_uid=185
 
 RUN groupadd --system --gid=${spark_uid} spark && \
-    useradd --system --uid=${spark_uid} --gid=spark spark
+    useradd --system --uid=${spark_uid} --gid=spark -d /nonexistent spark
 
 RUN set -ex; \
     apt-get update; \


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to