[
https://issues.apache.org/jira/browse/SPARK-49068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884442#comment-17884442
]
Aleksandr Aleksandrov commented on SPARK-49068:
-----------------------------------------------
Hello!
I had the same issue when try to build pyspark docker image from scratch. I
decided to replace this
pip3 install --upgrade pip setuptools && \
to
pip3 install --upgrade setuptools --break-system-packages && \
And It works fine but seems it is not a good decision.
> "Externally Managed Environment" error when building PySpark Docker image
> --------------------------------------------------------------------------
>
> Key: SPARK-49068
> URL: https://issues.apache.org/jira/browse/SPARK-49068
> Project: Spark
> Issue Type: Bug
> Components: Spark Docker
> Affects Versions: 3.5.1
> Reporter: Chao Sun
> Priority: Major
>
> When trying to build Docker image based on PySpark Dockerfile in Ubuntu
> 20.04, I got the following error:
> {code}
> #7 19.13 error: externally-managed-environment
> #7 19.13
> #7 19.13 × This environment is externally managed
> #7 19.13 ╰─> To install Python packages system-wide, try apt install
> #7 19.13 python3-xyz, where xyz is the package you are trying to
> #7 19.13 install.
> #7 19.13
> #7 19.13 If you wish to install a non-Debian-packaged Python package,
> #7 19.13 create a virtual environment using python3 -m venv path/to/venv.
> #7 19.13 Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
> #7 19.13 sure you have python3-full installed.
> #7 19.13
> #7 19.13 If you wish to install a non-Debian packaged Python application,
> #7 19.13 it may be easiest to use pipx install xyz, which will manage a
> #7 19.13 virtual environment for you. Make sure you have pipx installed.
> #7 19.13
> #7 19.13 See /usr/share/doc/python3.12/README.venv for more information.
> #7 19.13
> #7 19.13 note: If you believe this is a mistake, please contact your Python
> installation or OS distribution provider. You can override this, at the risk
> of breaking your Python installation or OS, by passing
> --break-system-packages.
> #7 19.13 hint: See PEP 668 for the detailed specification.
> #7 ERROR: process "/bin/sh -c apt-get update && apt install -y python3
> python3-pip && rm -rf /usr/lib/python3.11/EXTERNALLY-MANAGED && pip3
> install --upgrade pip setuptools && rm -rf /root/.cache && rm -rf
> /var/cache/apt/* && rm -rf /var/lib/apt/lists/*" did not complete
> successfully: exit code: 1
> {code}
> Looking at the
> [Dockerfile|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile],
> it does the following:
> {code}
> RUN apt-get update && \
> apt install -y python3 python3-pip && \
> pip3 install --upgrade pip setuptools && \
> # Removed the .cache to save space
> rm -rf /root/.cache && rm -rf /var/cache/apt/* && rm -rf
> /var/lib/apt/lists/*
> {code}
> If {{pip}} was installed by the system package manager, and then we are
> trying to overwrite it via {{pip3 install}}, the error could happen.
> A simple solution would be to create a virtual environment first, install the
> latest pip there, and then update {{PATH}} to use that instead.
> Wonder if anyone else has encountered the same issue and whether it is a good
> idea to fix it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]