uranusjr commented on a change in pull request #19189:
URL: https://github.com/apache/airflow/pull/19189#discussion_r736144799
##########
File path: Dockerfile
##########
@@ -21,13 +21,13 @@
#
# airflow-build-image - there all airflow dependencies can be installed (and
# built - for those dependencies that require
-# build essentials). Airflow is installed there with
-# --user switch so that all the dependencies are
-# installed to ${HOME}/.local
+# build essentials). Python deps are installed in .venv
+# venv so that all the dependencies are
+# installed to ${HOME}/.venv
Review comment:
```suggestion
# build essentials). Python dependencies are installed
# into a virtual environment at ${HOME}/.venv
```
`venv` is an unfortuantely an ambigious term so it's preferred to spell the
full name out.
##########
File path: Dockerfile
##########
@@ -183,16 +183,19 @@ ENV INSTALL_MYSQL_CLIENT=${INSTALL_MYSQL_CLIENT} \
AIRFLOW_CONSTRAINTS_REFERENCE=${AIRFLOW_CONSTRAINTS_REFERENCE} \
AIRFLOW_CONSTRAINTS_LOCATION=${AIRFLOW_CONSTRAINTS_LOCATION} \
DEFAULT_CONSTRAINTS_BRANCH=${DEFAULT_CONSTRAINTS_BRANCH} \
- PATH=${PATH}:/root/.local/bin \
+ PATH=/.venv/bin:${PATH} \
AIRFLOW_PIP_VERSION=${AIRFLOW_PIP_VERSION} \
PIP_PROGRESS_BAR=${PIP_PROGRESS_BAR} \
- # Install Airflow with "--user" flag, so that we can copy the whole .local
folder to the final image
+ # Install Airflow in venv, so that we can copy the whole .venv folder to
the final image
# from the build image and always in non-editable mode
- AIRFLOW_INSTALL_USER_FLAG="--user" \
AIRFLOW_INSTALL_EDITABLE_FLAG="" \
UPGRADE_TO_NEWER_DEPENDENCIES=${UPGRADE_TO_NEWER_DEPENDENCIES}
-COPY scripts/docker/*.sh /scripts/docker/
+# Only copy mysql/mssql installation scripts for now - so that changing the
other
+# scripts which are needed much later will not invalidate the docker layer here
+COPY scripts/docker/common.sh scripts/docker/install_mysql.sh
scripts/docker/install_mssql.sh \
+ /scripts/docker/
Review comment:
Is this just some minor optimisation or are there implications?
##########
File path: Dockerfile
##########
@@ -183,16 +183,19 @@ ENV INSTALL_MYSQL_CLIENT=${INSTALL_MYSQL_CLIENT} \
AIRFLOW_CONSTRAINTS_REFERENCE=${AIRFLOW_CONSTRAINTS_REFERENCE} \
AIRFLOW_CONSTRAINTS_LOCATION=${AIRFLOW_CONSTRAINTS_LOCATION} \
DEFAULT_CONSTRAINTS_BRANCH=${DEFAULT_CONSTRAINTS_BRANCH} \
- PATH=${PATH}:/root/.local/bin \
+ PATH=/.venv/bin:${PATH} \
AIRFLOW_PIP_VERSION=${AIRFLOW_PIP_VERSION} \
PIP_PROGRESS_BAR=${PIP_PROGRESS_BAR} \
- # Install Airflow with "--user" flag, so that we can copy the whole .local
folder to the final image
+ # Install Airflow in venv, so that we can copy the whole .venv folder to
the final image
Review comment:
```suggestion
# Install Airflow in a virtual environment, so that we can copy the whole
# .venv folder to the final image
```
##########
File path: Dockerfile
##########
@@ -382,12 +392,8 @@ ARG AIRFLOW_HOME
# Having the variable in final image allows to disable providers manager
warnings when
# production image is prepared from sources rather than from package
ARG AIRFLOW_INSTALLATION_METHOD="apache-airflow"
-ARG BUILD_ID
-ARG COMMIT_SHA
ARG AIRFLOW_IMAGE_REPOSITORY
ARG AIRFLOW_IMAGE_DATE_CREATED
-# By default PIP will install everything in ~/.local
-ARG PIP_USER="true"
Review comment:
This *would* be backward incompatible, right? We should at least show a
message somewhere saying this is no longer effective.
##########
File path: Dockerfile
##########
@@ -399,12 +405,9 @@ ENV RUNTIME_APT_DEPS=${RUNTIME_APT_DEPS} \
AIRFLOW__CORE__LOAD_EXAMPLES="false" \
AIRFLOW_USER_HOME_DIR=${AIRFLOW_USER_HOME_DIR} \
AIRFLOW_HOME=${AIRFLOW_HOME} \
- PATH="${AIRFLOW_USER_HOME_DIR}/.local/bin:${PATH}" \
+ PATH="/.venv/bin:${PATH}" \
Review comment:
Is this still needed?
##########
File path: airflow/operators/python.py
##########
@@ -338,6 +347,7 @@ def __init__(
python_version: Optional[Union[str, int, float]] = None,
use_dill: bool = False,
system_site_packages: bool = True,
+ clone_airflow_virtualenv: bool = True,
Review comment:
One problem here, if people are using `system_site_packages=True`, they
likely expect Airflow to be available when using the official Docker container
(because currently it installs Airflow globally). That won't be true after this
PR, so we need to call it out somewhere.
##########
File path: airflow/utils/python_virtualenv.py
##########
@@ -27,13 +28,22 @@
from airflow.utils.process_utils import execute_in_subprocess
-def _generate_virtualenv_cmd(tmp_dir: str, python_bin: str,
system_site_packages: bool) -> List[str]:
- cmd = [sys.executable, '-m', 'virtualenv', tmp_dir]
- if system_site_packages:
- cmd.append('--system-site-packages')
- if python_bin is not None:
- cmd.append(f'--python={python_bin}')
- return cmd
+def _generate_virtualenv_cmd(
+ tmp_dir: str, python_bin: str, system_site_packages: bool,
clone_virtualenv_packages: bool
+) -> List[str]:
+ if clone_virtualenv_packages and sys.prefix != sys.base_prefix and not
python_bin:
+ # Create virtualenv using virtualenv-clone command if we are in
virtualenv and
+ # clone_virtualenv_packages is set and we are using same version of
python
+ # as our virtualenv
+ cmd = ['virtualenv-clone', f'{sys.prefix}', tmp_dir]
+ return cmd
Review comment:
This branch should handle `system_site_packages` as well (by writing
`system_site_packages = true` to `pyvenv.cfg`)
##########
File path: scripts/ci/libraries/_verify_image.sh
##########
@@ -340,7 +340,7 @@ function verify_image::verify_prod_image_as_root() {
function verify_image::verify_production_image_has_dist_folder() {
start_end::group_start "Verify prod image has dist folder (compiled www
assets): ${DOCKER_IMAGE}"
# shellcheck disable=SC2016
- verify_image::check_command "Dist folder" '[ -f $(python -m site
--user-site)/airflow/www/static/dist/manifest.json ] || exit 1'
+ verify_image::check_command "Dist folder" '[ -f /.venv/lib/python$(python
-c "import sys;
print(str(sys.version_info[0])+\".\"+str(sys.version_info[1]))")/site-packages/airflow/www/static/dist/manifest.json
] || exit 1'
Review comment:
Probably easier to use `sysconfig`:
```pycon
>>> import sysconfig
>>> sysconfig.get_path('purelib')
'/.venv/lib/python3.6/site-packages'
```
##########
File path: docs/docker-stack/build.rst
##########
@@ -203,14 +203,22 @@ You should be aware, about a few things:
`best practices of Dockerfiles
<https://docs.docker.com/develop/develop-images/dockerfile_best-practices/>`_
to make sure your image is lean and small.
-* The PyPI dependencies in Apache Airflow are installed in the user library,
of the "airflow" user, so
- PIP packages are installed to ``~/.local`` folder as if the ``--user`` flag
was specified when running PIP.
- Note also that using ``--no-cache-dir`` is a good idea that can help to make
your image smaller.
+* Using ``--no-cache-dir`` is a good idea that can help to make your image
smaller.
+
+* The PyPI dependencies in Apache Airflow are installed in the ``/.venv``
folder. This is virtualenv where
+ airflow and all dependent packages are installed, following latest ``pip``
recommendations and upcoming
+ `PEP 668 <https://www.python.org/dev/peps/pep-0668/>`_. The ``PATH`` inside
the image is set to point first
+ to the ``bin`` folder of the virtualenv, to make sure that this virtualenv
is used, also safe PATH for sudo
+ is set to include that folder. The virtualenv is also activated at the entry
of interactive session by
+ ``.bashrc`` files placed in ``HOME`` directory of both ``airflow`` user and
``root`` user. Note that all
+ the arbitrary users created dynamically to follow OpenShift rules are
sharing the ``airflow`` user
+ ``HOME`` folder, so ``.bashrc`` file is the same for those users as well.
Review comment:
Again, spell out "virtual environment"
##########
File path: Dockerfile
##########
@@ -441,25 +443,53 @@ RUN chmod a+x /scripts/docker/install_mysql.sh && \
find "${AIRFLOW_HOME}" -executable -print0 | xargs --null chmod g+x && \
find "${AIRFLOW_HOME}" -print0 | xargs --null chmod g+rw
-COPY --chown=airflow:root --from=airflow-build-image /root/.local
"${AIRFLOW_USER_HOME_DIR}/.local"
+COPY --chown=airflow:root --from=airflow-build-image /.venv /.venv
COPY --chown=airflow:root scripts/in_container/prod/entrypoint_prod.sh
/entrypoint
COPY --chown=airflow:root scripts/in_container/prod/clean-logs.sh /clean-logs
# Make /etc/passwd root-group-writeable so that user can be dynamically added
by OpenShift
# See https://github.com/apache/airflow/issues/9248
RUN chmod a+x /entrypoint /clean-logs && \
- chmod g=u /etc/passwd && \
- bash /scripts/docker/install_pip_version.sh
+ chmod g=u /etc/passwd
+
+# Set default groups for airflow and root user
+RUN usermod -g 0 airflow -G 0
+
+# make sure that the venv is activated for all users
+# interactive, non-interactive shells and plain sudo, sudo with --interactive
flag
+RUN echo ". /.venv/bin/activate" >> "${AIRFLOW_USER_HOME_DIR}/.bashrc" && \
+ echo ". /.venv/bin/activate" >> /root/.bashrc && \
+ sed --in-place=.bak "s/secure_path=\"/secure_path=\"\/.venv\/bin:/"
/etc/sudoers
Review comment:
Actually I'm wonder if it might be better to just put `/.venv/bin` into
`PATH`. It isn't really useful to allow the user to deactive the virtual
environment when shelling into the container (that might even be a footgun for
some). (If we want the prompt change we can just set `PS1`, which is what the
activation script does.)
##########
File path: docs/docker-stack/build.rst
##########
@@ -203,14 +203,22 @@ You should be aware, about a few things:
`best practices of Dockerfiles
<https://docs.docker.com/develop/develop-images/dockerfile_best-practices/>`_
to make sure your image is lean and small.
-* The PyPI dependencies in Apache Airflow are installed in the user library,
of the "airflow" user, so
- PIP packages are installed to ``~/.local`` folder as if the ``--user`` flag
was specified when running PIP.
- Note also that using ``--no-cache-dir`` is a good idea that can help to make
your image smaller.
+* Using ``--no-cache-dir`` is a good idea that can help to make your image
smaller.
+
+* The PyPI dependencies in Apache Airflow are installed in the ``/.venv``
folder. This is virtualenv where
+ airflow and all dependent packages are installed, following latest ``pip``
recommendations and upcoming
+ `PEP 668 <https://www.python.org/dev/peps/pep-0668/>`_. The ``PATH`` inside
the image is set to point first
Review comment:
I feel mentioning PEP 668 here is only going to confuse users, most of
them probably are not familiar with this kind of things.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]