jbampton opened a new issue, #2729:
URL: https://github.com/apache/sedona/issues/2729

   https://github.com/apache/sedona/blob/master/docker/sedona-docker.dockerfile
   
   ---
   
   This Dockerfile builds a powerful geospatial engine, but from a security 
perspective, it's currently running with "God Mode" enabled. Because Docker 
containers share the host's kernel, a vulnerability in Spark or Zeppelin could 
allow an attacker to escape to your host machine if the container is running as 
**root**.
   
   Here is how you can harden this configuration:
   
   ---
   
   ## πŸ›‘οΈ 1. Implement a Non-Root User (Critical)
   
   Currently, all processes (Spark, Zeppelin, and the shell) run as `root`. If 
an attacker exploits a web-facing service like Zeppelin (port 8085), they have 
full control over the container.
   
   **The Fix:** Create a dedicated user and change ownership of the directories.
   
   ```dockerfile
   # Create a system user
   RUN groupadd -r sedona && useradd -r -g sedona -d /opt/workspace sedona
   RUN chown -R sedona:sedona /opt/spark /opt/zeppelin /opt/workspace
   
   # Switch to the user before the CMD
   USER sedona
   
   ```
   
   ## πŸ“¦ 2. Pin Your OS Packages
   
   The command `apt-get install -y` pulls the latest available version at build 
time. This is great for features but bad for **reproducibility** and 
**auditing**. If a repository is compromised or a buggy version is released, 
your build will break or become vulnerable without warning.
   
   **The Fix:** Specify versions for critical libraries:
   `openjdk-17-jdk-headless=17.0.x-xx`
   
   ## 🧹 3. Clean Up Build Tools & Cache
   
   The image currently includes `maven`, `curl`, and `pip3` caches. These 
increase the "attack surface"β€”an attacker who gains entry now has the tools to 
download and compile malicious binaries inside your container.
   
   **The Fix:** Use a multi-stage build or clean up in the same `RUN` layer:
   
   ```dockerfile
   RUN apt-get update && apt-get install -y ... \
       && rm -rf /var/lib/apt/lists/* \
       && apt-get purge -y --auto-remove maven curl
   
   ```
   
   ## 🐍 4. Secure Python Dependencies
   
   The use of `--break-system-packages` is a shortcut that can lead to 
conflicts with OS-level Python scripts. Furthermore, `requirements.txt` should 
ideally use **hashes** to ensure the downloaded packages haven't been tampered 
with.
   
   **The Fix:**
   
   1. Use a **Virtual Environment (venv)** instead of installing globally.
   2. Use `pip-compile` to generate a `requirements.txt` with SHA-256 hashes.
   
   ## πŸ” 5. Secret Management
   
   You are passing `aws_sdk_version` and setting up S3 support. Ensure that you 
**never** hardcode AWS keys (`AWS_ACCESS_KEY_ID`) in this Dockerfile or its 
environment variables.
   
   ---
   
   ## Improved Dockerfile Snippet (Security Focused)
   
   Here is how the middle section of your Dockerfile should look to be 
significantly more secure:
   
   ```dockerfile
   # ... after installations ...
   
   # 1. Create non-privileged user
   RUN groupadd -g 10001 sedonagroup && \
       useradd -u 10001 -g sedonagroup -m -s /bin/bash sedonauser
   
   # 2. Set strict permissions
   RUN chown -R sedonauser:sedonagroup /opt/spark /opt/zeppelin 
${SHARED_WORKSPACE} && \
       chmod -R 755 /opt/spark /opt/zeppelin
   
   # 3. Clean up apt cache to reduce image size and attack surface
   RUN apt-get clean && rm -rf /var/lib/apt/lists/*
   
   WORKDIR ${SHARED_WORKSPACE}
   USER sedonauser
   
   COPY ./docker/start.sh /opt/
   CMD ["/bin/bash", "/opt/start.sh"]
   
   ```
   
   ### Next Step
   
   Would you like me to show you how to set up **Trivy** or **Docker Scout** to 
automatically scan this image for known vulnerabilities (CVEs)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to