jbampton opened a new issue, #2729: URL: https://github.com/apache/sedona/issues/2729
https://github.com/apache/sedona/blob/master/docker/sedona-docker.dockerfile --- This Dockerfile builds a powerful geospatial engine, but from a security perspective, it's currently running with "God Mode" enabled. Because Docker containers share the host's kernel, a vulnerability in Spark or Zeppelin could allow an attacker to escape to your host machine if the container is running as **root**. Here is how you can harden this configuration: --- ## π‘οΈ 1. Implement a Non-Root User (Critical) Currently, all processes (Spark, Zeppelin, and the shell) run as `root`. If an attacker exploits a web-facing service like Zeppelin (port 8085), they have full control over the container. **The Fix:** Create a dedicated user and change ownership of the directories. ```dockerfile # Create a system user RUN groupadd -r sedona && useradd -r -g sedona -d /opt/workspace sedona RUN chown -R sedona:sedona /opt/spark /opt/zeppelin /opt/workspace # Switch to the user before the CMD USER sedona ``` ## π¦ 2. Pin Your OS Packages The command `apt-get install -y` pulls the latest available version at build time. This is great for features but bad for **reproducibility** and **auditing**. If a repository is compromised or a buggy version is released, your build will break or become vulnerable without warning. **The Fix:** Specify versions for critical libraries: `openjdk-17-jdk-headless=17.0.x-xx` ## π§Ή 3. Clean Up Build Tools & Cache The image currently includes `maven`, `curl`, and `pip3` caches. These increase the "attack surface"βan attacker who gains entry now has the tools to download and compile malicious binaries inside your container. **The Fix:** Use a multi-stage build or clean up in the same `RUN` layer: ```dockerfile RUN apt-get update && apt-get install -y ... \ && rm -rf /var/lib/apt/lists/* \ && apt-get purge -y --auto-remove maven curl ``` ## π 4. Secure Python Dependencies The use of `--break-system-packages` is a shortcut that can lead to conflicts with OS-level Python scripts. Furthermore, `requirements.txt` should ideally use **hashes** to ensure the downloaded packages haven't been tampered with. **The Fix:** 1. Use a **Virtual Environment (venv)** instead of installing globally. 2. Use `pip-compile` to generate a `requirements.txt` with SHA-256 hashes. ## π 5. Secret Management You are passing `aws_sdk_version` and setting up S3 support. Ensure that you **never** hardcode AWS keys (`AWS_ACCESS_KEY_ID`) in this Dockerfile or its environment variables. --- ## Improved Dockerfile Snippet (Security Focused) Here is how the middle section of your Dockerfile should look to be significantly more secure: ```dockerfile # ... after installations ... # 1. Create non-privileged user RUN groupadd -g 10001 sedonagroup && \ useradd -u 10001 -g sedonagroup -m -s /bin/bash sedonauser # 2. Set strict permissions RUN chown -R sedonauser:sedonagroup /opt/spark /opt/zeppelin ${SHARED_WORKSPACE} && \ chmod -R 755 /opt/spark /opt/zeppelin # 3. Clean up apt cache to reduce image size and attack surface RUN apt-get clean && rm -rf /var/lib/apt/lists/* WORKDIR ${SHARED_WORKSPACE} USER sedonauser COPY ./docker/start.sh /opt/ CMD ["/bin/bash", "/opt/start.sh"] ``` ### Next Step Would you like me to show you how to set up **Trivy** or **Docker Scout** to automatically scan this image for known vulnerabilities (CVEs)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
