potiuk commented on a change in pull request #13585:
URL: https://github.com/apache/airflow/pull/13585#discussion_r554424980



##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -116,6 +116,11 @@ if ! whoami &> /dev/null; then
   export HOME="${AIRFLOW_USER_HOME_DIR}"
 fi
 
+# Install extra python packages if requirements.txt is present.

Review comment:
       Using Airflow AIRFLOW_HOME for extra requirements is not a good idea. 
It's better to mount the whole folder where you will mount various files, and 
in this case overriding AIRFLOW_HOME is wrong. Besides AIRFLOW_HOME is 
read-write where the requirements and related files should be "read-only"
   
   It will be better if you follow the convention that we already have in our 
"CI" image.
   
   * "/files" directory should be there for any files mounted to to the image
   * "/files/requirements" should be the folder where requirements are placed
   * "/files/requirements/requirements-3.7.txt" should be the file that you 
load (depending on the python version you might have different requirements"
   
   Additionally you must print some diagnostics information so that when you 
see the log files you will see exactly what happening.
   
   This way you will not only be able to use it in production but also test it 
seamlessly for different python versions in our 'breeze` development envionment.
   
   Relevant snippet from the `run_init_script.sh`:
   
   ```
   if [ -z "${FILES_DIR+x}" ]; then
       export FILES_DIR="/files"
   fi
   if [ -z "${AIRFLOW_BREEZE_CONFIG_DIR+x}" ]; then
       export AIRFLOW_BREEZE_CONFIG_DIR="${FILES_DIR}/airflow-breeze-config"
   fi
   
   if [ -z "${INIT_SCRIPT_FILE=}" ]; then
       export INIT_SCRIPT_FILE="init.sh"
   fi
   
   if [[ -d "${AIRFLOW_BREEZE_CONFIG_DIR}" && \
       -f "${AIRFLOW_BREEZE_CONFIG_DIR}/${INIT_SCRIPT_FILE}" ]]; then
   
           pushd "${AIRFLOW_BREEZE_CONFIG_DIR}" >/dev/null 2>&1 || exit 1
           echo
           echo "Sourcing the initialization script from ${INIT_SCRIPT_FILE} in 
${AIRFLOW_BREEZE_CONFIG_DIR}"
           echo
            # shellcheck disable=1090
           source "${INIT_SCRIPT_FILE}"
           popd >/dev/null 2>&1 || exit 1
   else
       echo
       echo "You can add ${AIRFLOW_BREEZE_CONFIG_DIR} directory and place 
${INIT_SCRIPT_FILE}"
       echo "In it to make breeze source an initialization script automatically 
for you"
       echo
   fi
   ```

##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -116,6 +116,11 @@ if ! whoami &> /dev/null; then
   export HOME="${AIRFLOW_USER_HOME_DIR}"
 fi
 
+# Install extra python packages if requirements.txt is present.

Review comment:
       Using Airflow AIRFLOW_HOME for extra requirements is not a good idea. 
It's better to mount the whole folder where you will mount various files, and 
in this case overriding AIRFLOW_HOME is wrong. Besides AIRFLOW_HOME is 
read-write where the requirements and related files should be "read-only"
   
   It will be better if you follow the convention that we already have in our 
"CI" image.
   
   * "/files" directory should be there for any files mounted to to the image
   * "/files/requirements" should be the folder where requirements are placed
   * "/files/requirements/requirements-3.7.txt" should be the file that you 
load (depending on the python version you might have different requirements)
   
   Additionally you must print some diagnostics information so that when you 
see the log files you will see exactly what happening.
   
   This way you will not only be able to use it in production but also test it 
seamlessly for different python versions in our 'breeze` development envionment.
   
   Relevant snippet from the `run_init_script.sh`:
   
   ```
   if [ -z "${FILES_DIR+x}" ]; then
       export FILES_DIR="/files"
   fi
   if [ -z "${AIRFLOW_BREEZE_CONFIG_DIR+x}" ]; then
       export AIRFLOW_BREEZE_CONFIG_DIR="${FILES_DIR}/airflow-breeze-config"
   fi
   
   if [ -z "${INIT_SCRIPT_FILE=}" ]; then
       export INIT_SCRIPT_FILE="init.sh"
   fi
   
   if [[ -d "${AIRFLOW_BREEZE_CONFIG_DIR}" && \
       -f "${AIRFLOW_BREEZE_CONFIG_DIR}/${INIT_SCRIPT_FILE}" ]]; then
   
           pushd "${AIRFLOW_BREEZE_CONFIG_DIR}" >/dev/null 2>&1 || exit 1
           echo
           echo "Sourcing the initialization script from ${INIT_SCRIPT_FILE} in 
${AIRFLOW_BREEZE_CONFIG_DIR}"
           echo
            # shellcheck disable=1090
           source "${INIT_SCRIPT_FILE}"
           popd >/dev/null 2>&1 || exit 1
   else
       echo
       echo "You can add ${AIRFLOW_BREEZE_CONFIG_DIR} directory and place 
${INIT_SCRIPT_FILE}"
       echo "In it to make breeze source an initialization script automatically 
for you"
       echo
   fi
   ```

##########
File path: chart/templates/_helpers.yaml
##########
@@ -357,6 +357,10 @@ server_tls_key_file = /etc/pgbouncer/server.key
 {{ (printf "%s/config/airflow_local_settings.py" .Values.airflowHome) | quote 
}}
 {{- end }}
 
+{{ define "airflow_install_requirements_path" -}}

Review comment:
       The name should be change to match the naming we already have.
   
   In our Dockerfile we already have ADDITIONAL_PYTHON_DEPS variable and we are 
installing additional python dependencies. @mik-laj rightfully noticed that not 
alll deps can be installed this way (we do not have build-essentials in the 
image for one) but some can be, and I agree it's nice to have, them, it should 
be clear however, that those are additional dependencies, and they are not 
de-facto "install_requirements". The "install_requirements" in python world 
have a very precise meaning - those are to be found in `setup.py` or 
`setup.cfg` and they mean "those need to be installed BEFORE you install 
airflow". What we are talking about here are "additional python dependencies" 
which are installed AFTER airflow is installed. So let's name it this way.
   
   

##########
File path: chart/templates/_helpers.yaml
##########
@@ -357,6 +357,10 @@ server_tls_key_file = /etc/pgbouncer/server.key
 {{ (printf "%s/config/airflow_local_settings.py" .Values.airflowHome) | quote 
}}
 {{- end }}
 
+{{ define "airflow_install_requirements_path" -}}

Review comment:
       The name should be change to match the naming we already have.
   
   In our Dockerfile we already have ADDITIONAL_PYTHON_DEPS variable and we are 
installing additional python dependencies. @mik-laj rightfully noticed that not 
alll deps can be installed this way (we do not have build-essentials in the 
image for one) but some can be, and I agree it's nice to have, them, it should 
be clear however, that those are additional dependencies, and they are not 
de-facto "install_requirements". The "install_requirements" in python world 
have a very precise meaning - those are to be found in `setup.py` or 
`setup.cfg` and they mean "those need to be installed BEFORE you install 
airflow". What we are talking about here are "additional python dependencies" 
which are installed AFTER airflow is installed. So let's name it this way.
   
   Relevant snippet from Dockerfile:
   
   ```
   # Add extra python dependencies
   ARG ADDITIONAL_PYTHON_DEPS=""
   ENV ADDITIONAL_PYTHON_DEPS=${ADDITIONAL_PYTHON_DEPS}
   ```

##########
File path: chart/templates/_helpers.yaml
##########
@@ -357,6 +357,10 @@ server_tls_key_file = /etc/pgbouncer/server.key
 {{ (printf "%s/config/airflow_local_settings.py" .Values.airflowHome) | quote 
}}
 {{- end }}
 
+{{ define "airflow_install_requirements_path" -}}

Review comment:
       This is of course slightly different than the Dockerfile ones (those are 
embedded in the customized image) but the meaning is the same, so name should 
also be the same.

##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -116,6 +116,11 @@ if ! whoami &> /dev/null; then
   export HOME="${AIRFLOW_USER_HOME_DIR}"
 fi
 
+# Install extra python packages if requirements.txt is present.
+if [[ -f "${AIRFLOW_HOME}/requirements.txt" ]]; then
+    pip install --user -r "${AIRFLOW_HOME}/requirements.txt"

Review comment:
       This is not correct way of installing the dependencies. Look here:
   
   
https://github.com/apache/airflow/blob/master/scripts/docker/install_additional_dependencies.sh
   
   This script has been extracted yesterdy from the Dockerfile and it shows the 
right way of installing additional dependencies. It handles  eager upgrade case 
which is not relevant but the "else" part of the if is how the install should 
be done:
   
   `pip install --user --upgrade --upgrade-strategy only-if-needed`
   
   This way you will automatically upgrade existing dependencies in case any of 
the existing dependencies require it. This should be likely also configurable 
(one might want to disable it), but it should be the default IMHO (unless there 
are good reasons not to do it).
   
   Additionally it would be great to add a "pip check" step afterwards and fail 
if there are some consistency problems. This will force anyone  who adds their 
requirements this way will not add conflicting requirements. 
   
   

##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -116,6 +116,11 @@ if ! whoami &> /dev/null; then
   export HOME="${AIRFLOW_USER_HOME_DIR}"
 fi
 
+# Install extra python packages if requirements.txt is present.
+if [[ -f "${AIRFLOW_HOME}/requirements.txt" ]]; then
+    pip install --user -r "${AIRFLOW_HOME}/requirements.txt"

Review comment:
       Relevant code snippet:
   
   ```
           echo
           echo Installing additional dependencies upgrading only if needed
           echo
           pip install ${AIRFLOW_INSTALL_USER_FLAG} \
               --upgrade --upgrade-strategy only-if-needed \
               ${ADDITIONAL_PYTHON_DEPS}
           # make sure correct PIP version is used
           pip install ${AIRFLOW_INSTALL_USER_FLAG} --upgrade 
"pip==${AIRFLOW_PIP_VERSION}"
           pip check || ${CONTINUE_ON_PIP_CHECK_FAILURE}
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to