potiuk commented on PR #35026:
URL: https://github.com/apache/airflow/pull/35026#issuecomment-1772488509

   Just to complete it - explaining the `build` segment.
   
   The `airflow-build-image` segment is slightly more complicated: - it allows 
for more sophisticated caching (used during our CI build)  but essentially it 
does this:
   
   * Installing OS dependencies (dev) = build-essentials + dev libraries
   
   ````
   RUN bash /scripts/docker/install_os_dependencies.sh dev
   ````
   
   * Installing database clients (parameterized with args)
   
   ````
   RUN bash /scripts/docker/install_mysql.sh dev && \
       bash /scripts/docker/install_mssql.sh dev && \
       bash /scripts/docker/install_postgres.sh dev
   ````
   
   * Branch Tip caching - (this is skipped by default in regular builds) - 
pre-installing `pip` dependencies from the tip of the branch (used in CI only 
(this is a "poor man's" version of incremental caching for CI (so what 
@hussein-awala mentioned as something might be implemented via 
https://github.com/moby/buildkit/issues/1512 ). This allows in our CI to use 
Docker caching layer instead of local cache for incremental builds when new 
requirements are added. 
   
   This layer is not invalidated when setup.py/requirements change, so we use 
it as to pre-install the dependencies in main, so that if someone adds new 
dependency, it will not reinstall everything from scratch.
   
   ```
   RUN bash /scripts/docker/install_pip_version.sh; \
       if [[ ${AIRFLOW_PRE_CACHED_PIP_PACKAGES} == "true" && \
           ${INSTALL_PACKAGES_FROM_CONTEXT} == "false" && \
           ${UPGRADE_TO_NEWER_DEPENDENCIES} == "false" ]]; then \
           bash 
/scripts/docker/install_airflow_dependencies_from_branch_tip.sh; \
       fi
   ```
   
   * Installing airflow and packages (depends on args - it will install them 
either from `pypi` or from paackages in docker-context files and alows to also 
specify ADDITIONAL_PYTHON_DEPS
   
   
   ```
   RUN if [[ ${INSTALL_PACKAGES_FROM_CONTEXT} == "true" ]]; then \
           bash /scripts/docker/install_from_docker_context_files.sh; \
       fi; \
       if ! airflow version 2>/dev/null >/dev/null; then \
           bash /scripts/docker/install_airflow.sh; \
       fi; \
       if [[ -n "${ADDITIONAL_PYTHON_DEPS}" ]]; then \
           bash /scripts/docker/install_additional_dependencies.sh; \
       fi; \
       find "${AIRFLOW_USER_HOME_DIR}/.local/" -name '*.pyc' -print0 | xargs -0 
rm -f || true ; \
       find "${AIRFLOW_USER_HOME_DIR}/.local/" -type d -name '__pycache__' 
-print0 | xargs -0 rm -rf || true ; \
       # make sure that all directories and files in .local are also group 
accessible
       find "${AIRFLOW_USER_HOME_DIR}/.local" -executable -print0 | xargs 
--null chmod g+x; \
       find "${AIRFLOW_USER_HOME_DIR}/.local" -print0 | xargs --null chmod g+rw
   ```
   
   * Installing from requirements.txt placed in docker-context-files
   
   ```
   RUN if [[ -f /docker-context-files/requirements.txt ]]; then \
           pip install --no-cache-dir --user -r 
/docker-context-files/requirements.txt; \
       fi
   ```
   
   So any "local" caching of python installation should be done here @V0lantis 
- if you want to pursue it and attempt to do it in the way that we avoid the 
potential cache invalidation issues I mentioned (we could also add apt-caching 
as suggested by @hterik  - those adding/iterating with new `apt` dependencies 
is less likely.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to