Chao Sun created SPARK-49068:
--------------------------------
Summary: "Externally Managed Environment" error when building
PySpark Docker image
Key: SPARK-49068
URL: https://issues.apache.org/jira/browse/SPARK-49068
Project: Spark
Issue Type: Bug
Components: Spark Docker
Affects Versions: 3.5.1
Reporter: Chao Sun
When trying to build Docker image based on PySpark Dockerfile in Ubuntu 20.04,
I got the following error:
{code}
#7 19.13 error: externally-managed-environment
#7 19.13
#7 19.13 × This environment is externally managed
#7 19.13 ╰─> To install Python packages system-wide, try apt install
#7 19.13 python3-xyz, where xyz is the package you are trying to
#7 19.13 install.
#7 19.13
#7 19.13 If you wish to install a non-Debian-packaged Python package,
#7 19.13 create a virtual environment using python3 -m venv path/to/venv.
#7 19.13 Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
#7 19.13 sure you have python3-full installed.
#7 19.13
#7 19.13 If you wish to install a non-Debian packaged Python application,
#7 19.13 it may be easiest to use pipx install xyz, which will manage a
#7 19.13 virtual environment for you. Make sure you have pipx installed.
#7 19.13
#7 19.13 See /usr/share/doc/python3.12/README.venv for more information.
#7 19.13
#7 19.13 note: If you believe this is a mistake, please contact your Python
installation or OS distribution provider. You can override this, at the risk of
breaking your Python installation or OS, by passing --break-system-packages.
#7 19.13 hint: See PEP 668 for the detailed specification.
#7 ERROR: process "/bin/sh -c apt-get update && apt install -y python3
python3-pip && rm -rf /usr/lib/python3.11/EXTERNALLY-MANAGED && pip3
install --upgrade pip setuptools && rm -rf /root/.cache && rm -rf
/var/cache/apt/* && rm -rf /var/lib/apt/lists/*" did not complete successfully:
exit code: 1
{code}
Looking at the
[Dockerfile|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile],
it does the following:
{code}
RUN apt-get update && \
apt install -y python3 python3-pip && \
pip3 install --upgrade pip setuptools && \
# Removed the .cache to save space
rm -rf /root/.cache && rm -rf /var/cache/apt/* && rm -rf
/var/lib/apt/lists/*
{code}
If {{pip}} was installed by the system package manager, and then we are trying
to overwrite it via {{pip3 install}}, the error could happen.
A simple solution would be to create a virtual environment first, install the
latest pip there, and then update {{PATH}} to use that instead.
Wonder if anyone else has encountered the same issue and whether it is a good
idea to fix it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]