This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git
The following commit(s) were added to refs/heads/master by this push: new 0f76cd1 [SPARK-52542] Use `/nonexistent` instead of nonexistent `/opt/spark` (#87) 0f76cd1 is described below commit 0f76cd1f98e924a07fb6a5551807015b634a92a2 Author: Dongjoon Hyun <dongj...@apache.org> AuthorDate: Mon Jun 23 09:16:26 2025 -0700 [SPARK-52542] Use `/nonexistent` instead of nonexistent `/opt/spark` (#87) ### What changes were proposed in this pull request? This PR aims to use `/nonexistent` explicitly instead of nonexistent `/home/spark` because the current status is misleading. Please note that SPARK-40528 introduced `useradd --system` which created `spark` user with a non-existent `/home/spark` directory from the beginning of this repository, `spark-docker`. - #12 https://github.com/apache/spark-docker/blob/c264d48dc510018095700ed33e700ccc34268bf2/Dockerfile.template#L21-L22 **Rejected Alternatives** - We can set `HOME` to `/opt/spark` like Apache Spark behavior. However, it's also different from `WORKDIR` (`/opt/spark/work-dir`). - We can create `/home/spark`, but it could be more vulnerable than AS-IS status. For `system` account, `/nonexistent` is frequently used as the security practice to prevent any side effects of `HOME` directory. ``` $ docker run -it --rm apache/spark:4.0.0 cat /etc/passwd | grep /nonexistent nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin _apt:x:100:65534::/nonexistent:/usr/sbin/nologin ``` ### Why are the changes needed? **Apache Spark 3.3.3** ``` $ docker run -it --rm apache/spark:3.3.3 /opt/spark/bin/spark-sql ... 25/06/20 20:15:41 WARN SparkSQLCLIDriver: WARNING: Directory for Hive history file: /home/spark does not exist. History will not be available during this session. ``` ``` $ docker run -it --rm -uroot apache/spark:3.3.3 tail -1 /etc/passwd spark:x:185:185::/home/spark:/bin/sh $ docker run -it --rm -uroot apache/spark:3.3.3 ls -al /home/spark ls: cannot access '/home/spark': No such file or directory ``` **Apache Spark 3.4.4** ``` $ docker run -it --rm -uroot apache/spark:3.4.4 tail -1 /etc/passwd spark:x:185:185::/home/spark:/bin/sh $ docker run -it --rm -uroot apache/spark:3.4.4 ls -al /home/spark ls: cannot access '/home/spark': No such file or directory ``` **Apache Spark 3.5.6** ``` $ docker run -it --rm -uroot apache/spark:3.5.6 tail -1 /etc/passwd spark:x:185:185::/home/spark:/bin/sh $ docker run -it --rm -uroot apache/spark:3.5.6 ls /home/spark ls: cannot access '/home/spark': No such file or directory ``` **Apache Spark 4.0.0** ``` $ docker run -it --rm -uroot apache/spark:4.0.0 tail -1 /etc/passwd spark:x:185:185::/home/spark:/bin/sh $ docker run -it --rm -uroot apache/spark:4.0.0 ls /home/spark ls: cannot access '/home/spark': No such file or directory ``` ### Does this PR introduce _any_ user-facing change? No behavior change because it doesn't exist already. ### How was this patch tested? Manual review. --- 4.0.0/scala2.13-java17-ubuntu/Dockerfile | 2 +- 4.0.0/scala2.13-java21-ubuntu/Dockerfile | 2 +- Dockerfile.template | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/4.0.0/scala2.13-java17-ubuntu/Dockerfile b/4.0.0/scala2.13-java17-ubuntu/Dockerfile index 031fc3e..0c84167 100644 --- a/4.0.0/scala2.13-java17-ubuntu/Dockerfile +++ b/4.0.0/scala2.13-java17-ubuntu/Dockerfile @@ -19,7 +19,7 @@ FROM eclipse-temurin:17-jammy ARG spark_uid=185 RUN groupadd --system --gid=${spark_uid} spark && \ - useradd --system --uid=${spark_uid} --gid=spark spark + useradd --system --uid=${spark_uid} --gid=spark -d /nonexistent spark RUN set -ex; \ apt-get update; \ diff --git a/4.0.0/scala2.13-java21-ubuntu/Dockerfile b/4.0.0/scala2.13-java21-ubuntu/Dockerfile index 15bd36b..b34f6e0 100644 --- a/4.0.0/scala2.13-java21-ubuntu/Dockerfile +++ b/4.0.0/scala2.13-java21-ubuntu/Dockerfile @@ -19,7 +19,7 @@ FROM eclipse-temurin:21-jammy ARG spark_uid=185 RUN groupadd --system --gid=${spark_uid} spark && \ - useradd --system --uid=${spark_uid} --gid=spark spark + useradd --system --uid=${spark_uid} --gid=spark -d /nonexistent spark RUN set -ex; \ apt-get update; \ diff --git a/Dockerfile.template b/Dockerfile.template index a410e06..ed07c88 100644 --- a/Dockerfile.template +++ b/Dockerfile.template @@ -19,7 +19,7 @@ FROM {{ BASE_IMAGE }} ARG spark_uid=185 RUN groupadd --system --gid=${spark_uid} spark && \ - useradd --system --uid=${spark_uid} --gid=spark spark + useradd --system --uid=${spark_uid} --gid=spark -d /nonexistent spark RUN set -ex; \ apt-get update; \ --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org