Chao Sun created SPARK-49068:
--------------------------------

             Summary: "Externally Managed Environment" error when building 
PySpark Docker image 
                 Key: SPARK-49068
                 URL: https://issues.apache.org/jira/browse/SPARK-49068
             Project: Spark
          Issue Type: Bug
          Components: Spark Docker
    Affects Versions: 3.5.1
            Reporter: Chao Sun


When trying to build Docker image based on PySpark Dockerfile in Ubuntu 20.04, 
I got the following error:
{code}
#7 19.13 error: externally-managed-environment
#7 19.13
#7 19.13 × This environment is externally managed
#7 19.13 ╰─> To install Python packages system-wide, try apt install
#7 19.13     python3-xyz, where xyz is the package you are trying to
#7 19.13     install.
#7 19.13
#7 19.13     If you wish to install a non-Debian-packaged Python package,
#7 19.13     create a virtual environment using python3 -m venv path/to/venv.
#7 19.13     Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
#7 19.13     sure you have python3-full installed.
#7 19.13
#7 19.13     If you wish to install a non-Debian packaged Python application,
#7 19.13     it may be easiest to use pipx install xyz, which will manage a
#7 19.13     virtual environment for you. Make sure you have pipx installed.
#7 19.13
#7 19.13     See /usr/share/doc/python3.12/README.venv for more information.
#7 19.13
#7 19.13 note: If you believe this is a mistake, please contact your Python 
installation or OS distribution provider. You can override this, at the risk of 
breaking your Python installation or OS, by passing --break-system-packages.
#7 19.13 hint: See PEP 668 for the detailed specification.
#7 ERROR: process "/bin/sh -c apt-get update &&     apt install -y python3 
python3-pip &&     rm -rf /usr/lib/python3.11/EXTERNALLY-MANAGED &&     pip3 
install --upgrade pip setuptools &&     rm -rf /root/.cache && rm -rf 
/var/cache/apt/* && rm -rf /var/lib/apt/lists/*" did not complete successfully: 
exit code: 1
{code}

Looking at the 
[Dockerfile|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile],
 it does the following:
{code}
RUN apt-get update && \
    apt install -y python3 python3-pip && \
    pip3 install --upgrade pip setuptools && \
    # Removed the .cache to save space
    rm -rf /root/.cache && rm -rf /var/cache/apt/* && rm -rf 
/var/lib/apt/lists/*
{code}

If {{pip}} was installed by the system package manager, and then we are trying 
to overwrite it via {{pip3 install}}, the error could happen.

A simple solution would be to create a virtual environment first, install the 
latest pip there, and then update {{PATH}} to use that instead. 

Wonder if anyone else has encountered the same issue and whether it is a good 
idea to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to