uranusjr commented on a change in pull request #19189:
URL: https://github.com/apache/airflow/pull/19189#discussion_r736144799



##########
File path: Dockerfile
##########
@@ -21,13 +21,13 @@
 #
 # airflow-build-image  - there all airflow dependencies can be installed (and
 #                        built - for those dependencies that require
-#                        build essentials). Airflow is installed there with
-#                        --user switch so that all the dependencies are
-#                        installed to ${HOME}/.local
+#                        build essentials). Python deps are installed in .venv
+#                        venv so that all the dependencies are
+#                        installed to ${HOME}/.venv

Review comment:
       ```suggestion
   #                        build essentials). Python dependencies are installed
   #                        into a virtual environment at ${HOME}/.venv
   ```
   
   `venv` is an unfortuantely an ambigious term so it's preferred to spell the 
full name out.

##########
File path: Dockerfile
##########
@@ -183,16 +183,19 @@ ENV INSTALL_MYSQL_CLIENT=${INSTALL_MYSQL_CLIENT} \
     AIRFLOW_CONSTRAINTS_REFERENCE=${AIRFLOW_CONSTRAINTS_REFERENCE} \
     AIRFLOW_CONSTRAINTS_LOCATION=${AIRFLOW_CONSTRAINTS_LOCATION} \
     DEFAULT_CONSTRAINTS_BRANCH=${DEFAULT_CONSTRAINTS_BRANCH} \
-    PATH=${PATH}:/root/.local/bin \
+    PATH=/.venv/bin:${PATH} \
     AIRFLOW_PIP_VERSION=${AIRFLOW_PIP_VERSION} \
     PIP_PROGRESS_BAR=${PIP_PROGRESS_BAR} \
-    # Install Airflow with "--user" flag, so that we can copy the whole .local 
folder to the final image
+    # Install Airflow in venv, so that we can copy the whole .venv folder to 
the final image
     # from the build image and always in non-editable mode
-    AIRFLOW_INSTALL_USER_FLAG="--user" \
     AIRFLOW_INSTALL_EDITABLE_FLAG="" \
     UPGRADE_TO_NEWER_DEPENDENCIES=${UPGRADE_TO_NEWER_DEPENDENCIES}
 
-COPY scripts/docker/*.sh /scripts/docker/
+# Only copy mysql/mssql installation scripts for now - so that changing the 
other
+# scripts which are needed much later will not invalidate the docker layer here
+COPY scripts/docker/common.sh scripts/docker/install_mysql.sh 
scripts/docker/install_mssql.sh \
+    /scripts/docker/

Review comment:
       Is this just some minor optimisation or are there implications?

##########
File path: Dockerfile
##########
@@ -183,16 +183,19 @@ ENV INSTALL_MYSQL_CLIENT=${INSTALL_MYSQL_CLIENT} \
     AIRFLOW_CONSTRAINTS_REFERENCE=${AIRFLOW_CONSTRAINTS_REFERENCE} \
     AIRFLOW_CONSTRAINTS_LOCATION=${AIRFLOW_CONSTRAINTS_LOCATION} \
     DEFAULT_CONSTRAINTS_BRANCH=${DEFAULT_CONSTRAINTS_BRANCH} \
-    PATH=${PATH}:/root/.local/bin \
+    PATH=/.venv/bin:${PATH} \
     AIRFLOW_PIP_VERSION=${AIRFLOW_PIP_VERSION} \
     PIP_PROGRESS_BAR=${PIP_PROGRESS_BAR} \
-    # Install Airflow with "--user" flag, so that we can copy the whole .local 
folder to the final image
+    # Install Airflow in venv, so that we can copy the whole .venv folder to 
the final image

Review comment:
       ```suggestion
       # Install Airflow in a virtual environment, so that we can copy the whole
       # .venv folder to the final image
   ```

##########
File path: Dockerfile
##########
@@ -382,12 +392,8 @@ ARG AIRFLOW_HOME
 # Having the variable in final image allows to disable providers manager 
warnings when
 # production image is prepared from sources rather than from package
 ARG AIRFLOW_INSTALLATION_METHOD="apache-airflow"
-ARG BUILD_ID
-ARG COMMIT_SHA
 ARG AIRFLOW_IMAGE_REPOSITORY
 ARG AIRFLOW_IMAGE_DATE_CREATED
-# By default PIP will install everything in ~/.local
-ARG PIP_USER="true"

Review comment:
       This *would* be backward incompatible, right? We should at least show a 
message somewhere saying this is no longer effective.

##########
File path: Dockerfile
##########
@@ -399,12 +405,9 @@ ENV RUNTIME_APT_DEPS=${RUNTIME_APT_DEPS} \
     AIRFLOW__CORE__LOAD_EXAMPLES="false" \
     AIRFLOW_USER_HOME_DIR=${AIRFLOW_USER_HOME_DIR} \
     AIRFLOW_HOME=${AIRFLOW_HOME} \
-    PATH="${AIRFLOW_USER_HOME_DIR}/.local/bin:${PATH}" \
+    PATH="/.venv/bin:${PATH}" \

Review comment:
       Is this still needed?

##########
File path: airflow/operators/python.py
##########
@@ -338,6 +347,7 @@ def __init__(
         python_version: Optional[Union[str, int, float]] = None,
         use_dill: bool = False,
         system_site_packages: bool = True,
+        clone_airflow_virtualenv: bool = True,

Review comment:
       One problem here, if people are using `system_site_packages=True`, they 
likely expect Airflow to be available when using the official Docker container 
(because currently it installs Airflow globally). That won't be true after this 
PR, so we need to call it out somewhere.

##########
File path: airflow/utils/python_virtualenv.py
##########
@@ -27,13 +28,22 @@
 from airflow.utils.process_utils import execute_in_subprocess
 
 
-def _generate_virtualenv_cmd(tmp_dir: str, python_bin: str, 
system_site_packages: bool) -> List[str]:
-    cmd = [sys.executable, '-m', 'virtualenv', tmp_dir]
-    if system_site_packages:
-        cmd.append('--system-site-packages')
-    if python_bin is not None:
-        cmd.append(f'--python={python_bin}')
-    return cmd
+def _generate_virtualenv_cmd(
+    tmp_dir: str, python_bin: str, system_site_packages: bool, 
clone_virtualenv_packages: bool
+) -> List[str]:
+    if clone_virtualenv_packages and sys.prefix != sys.base_prefix and not 
python_bin:
+        # Create virtualenv using virtualenv-clone command if we are in 
virtualenv and
+        # clone_virtualenv_packages is set and we are using same version of 
python
+        # as our virtualenv
+        cmd = ['virtualenv-clone', f'{sys.prefix}', tmp_dir]
+        return cmd

Review comment:
       This branch should handle `system_site_packages` as well (by writing 
`system_site_packages = true` to `pyvenv.cfg`)

##########
File path: scripts/ci/libraries/_verify_image.sh
##########
@@ -340,7 +340,7 @@ function verify_image::verify_prod_image_as_root() {
 function verify_image::verify_production_image_has_dist_folder() {
     start_end::group_start "Verify prod image has dist folder (compiled www 
assets): ${DOCKER_IMAGE}"
     # shellcheck disable=SC2016
-    verify_image::check_command "Dist folder" '[ -f $(python -m site 
--user-site)/airflow/www/static/dist/manifest.json ] || exit 1'
+    verify_image::check_command "Dist folder" '[ -f /.venv/lib/python$(python 
-c "import sys; 
print(str(sys.version_info[0])+\".\"+str(sys.version_info[1]))")/site-packages/airflow/www/static/dist/manifest.json
 ] || exit 1'

Review comment:
       Probably easier to use `sysconfig`:
   
   ```pycon
   >>> import sysconfig
   >>> sysconfig.get_path('purelib')
   '/.venv/lib/python3.6/site-packages'
   ```

##########
File path: docs/docker-stack/build.rst
##########
@@ -203,14 +203,22 @@ You should be aware, about a few things:
   `best practices of Dockerfiles 
<https://docs.docker.com/develop/develop-images/dockerfile_best-practices/>`_
   to make sure your image is lean and small.
 
-* The PyPI dependencies in Apache Airflow are installed in the user library, 
of the "airflow" user, so
-  PIP packages are installed to ``~/.local`` folder as if the ``--user`` flag 
was specified when running PIP.
-  Note also that using ``--no-cache-dir`` is a good idea that can help to make 
your image smaller.
+* Using ``--no-cache-dir`` is a good idea that can help to make your image 
smaller.
+
+* The PyPI dependencies in Apache Airflow are installed in the ``/.venv`` 
folder. This is virtualenv where
+  airflow and all dependent packages are installed, following latest ``pip`` 
recommendations and upcoming
+  `PEP 668 <https://www.python.org/dev/peps/pep-0668/>`_. The ``PATH`` inside 
the image is set to point first
+  to the ``bin`` folder of the virtualenv, to make sure that this virtualenv 
is used, also safe PATH for sudo
+  is set to include that folder. The virtualenv is also activated at the entry 
of interactive session by
+  ``.bashrc`` files placed in ``HOME`` directory of both ``airflow`` user and 
``root`` user. Note that all
+  the arbitrary users created dynamically to follow OpenShift rules are 
sharing the ``airflow`` user
+  ``HOME`` folder, so ``.bashrc`` file is the same for those users as well.

Review comment:
       Again, spell out "virtual environment"

##########
File path: Dockerfile
##########
@@ -441,25 +443,53 @@ RUN chmod a+x /scripts/docker/install_mysql.sh && \
     find "${AIRFLOW_HOME}" -executable -print0 | xargs --null chmod g+x && \
         find "${AIRFLOW_HOME}" -print0 | xargs --null chmod g+rw
 
-COPY --chown=airflow:root --from=airflow-build-image /root/.local 
"${AIRFLOW_USER_HOME_DIR}/.local"
+COPY --chown=airflow:root --from=airflow-build-image /.venv /.venv
 COPY --chown=airflow:root scripts/in_container/prod/entrypoint_prod.sh 
/entrypoint
 COPY --chown=airflow:root scripts/in_container/prod/clean-logs.sh /clean-logs
 
 # Make /etc/passwd root-group-writeable so that user can be dynamically added 
by OpenShift
 # See https://github.com/apache/airflow/issues/9248
 
 RUN chmod a+x /entrypoint /clean-logs && \
-    chmod g=u /etc/passwd && \
-    bash /scripts/docker/install_pip_version.sh
+    chmod g=u /etc/passwd
+
+# Set default groups for airflow and root user
+RUN usermod -g 0 airflow -G 0
+
+# make sure that the venv is activated for all users
+# interactive, non-interactive shells and plain sudo, sudo with --interactive 
flag
+RUN echo ". /.venv/bin/activate" >> "${AIRFLOW_USER_HOME_DIR}/.bashrc" && \
+    echo ". /.venv/bin/activate" >> /root/.bashrc && \
+    sed --in-place=.bak "s/secure_path=\"/secure_path=\"\/.venv\/bin:/" 
/etc/sudoers

Review comment:
       Actually I'm wonder if it might be better to just put `/.venv/bin` into 
`PATH`. It isn't really useful to allow the user to deactive the virtual 
environment when shelling into the container (that might even be a footgun for 
some). (If we want the prompt change we can just set `PS1`, which is what the 
activation script does.)

##########
File path: docs/docker-stack/build.rst
##########
@@ -203,14 +203,22 @@ You should be aware, about a few things:
   `best practices of Dockerfiles 
<https://docs.docker.com/develop/develop-images/dockerfile_best-practices/>`_
   to make sure your image is lean and small.
 
-* The PyPI dependencies in Apache Airflow are installed in the user library, 
of the "airflow" user, so
-  PIP packages are installed to ``~/.local`` folder as if the ``--user`` flag 
was specified when running PIP.
-  Note also that using ``--no-cache-dir`` is a good idea that can help to make 
your image smaller.
+* Using ``--no-cache-dir`` is a good idea that can help to make your image 
smaller.
+
+* The PyPI dependencies in Apache Airflow are installed in the ``/.venv`` 
folder. This is virtualenv where
+  airflow and all dependent packages are installed, following latest ``pip`` 
recommendations and upcoming
+  `PEP 668 <https://www.python.org/dev/peps/pep-0668/>`_. The ``PATH`` inside 
the image is set to point first

Review comment:
       I feel mentioning PEP 668 here is only going to confuse users, most of 
them probably are not familiar with this kind of things.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to