potiuk commented on PR #35026:
URL: https://github.com/apache/airflow/pull/35026#issuecomment-1772488509
Just to complete it - explaining the `build` segment.
The `airflow-build-image` segment is slightly more complicated: - it allows
for more sophisticated caching (used during our CI build) but essentially it
does this:
* Installing OS dependencies (dev) = build-essentials + dev libraries
````
RUN bash /scripts/docker/install_os_dependencies.sh dev
````
* Installing database clients (parameterized with args)
````
RUN bash /scripts/docker/install_mysql.sh dev && \
bash /scripts/docker/install_mssql.sh dev && \
bash /scripts/docker/install_postgres.sh dev
````
* Branch Tip caching - (this is skipped by default in regular builds) -
pre-installing `pip` dependencies from the tip of the branch (used in CI only
(this is a "poor man's" version of incremental caching for CI (so what
@hussein-awala mentioned as something might be implemented via
https://github.com/moby/buildkit/issues/1512 ). This allows in our CI to use
Docker caching layer instead of local cache for incremental builds when new
requirements are added.
This layer is not invalidated when setup.py/requirements change, so we use
it as to pre-install the dependencies in main, so that if someone adds new
dependency, it will not reinstall everything from scratch.
```
RUN bash /scripts/docker/install_pip_version.sh; \
if [[ ${AIRFLOW_PRE_CACHED_PIP_PACKAGES} == "true" && \
${INSTALL_PACKAGES_FROM_CONTEXT} == "false" && \
${UPGRADE_TO_NEWER_DEPENDENCIES} == "false" ]]; then \
bash
/scripts/docker/install_airflow_dependencies_from_branch_tip.sh; \
fi
```
* Installing airflow and packages (depends on args - it will install them
either from `pypi` or from paackages in docker-context files and alows to also
specify ADDITIONAL_PYTHON_DEPS
```
RUN if [[ ${INSTALL_PACKAGES_FROM_CONTEXT} == "true" ]]; then \
bash /scripts/docker/install_from_docker_context_files.sh; \
fi; \
if ! airflow version 2>/dev/null >/dev/null; then \
bash /scripts/docker/install_airflow.sh; \
fi; \
if [[ -n "${ADDITIONAL_PYTHON_DEPS}" ]]; then \
bash /scripts/docker/install_additional_dependencies.sh; \
fi; \
find "${AIRFLOW_USER_HOME_DIR}/.local/" -name '*.pyc' -print0 | xargs -0
rm -f || true ; \
find "${AIRFLOW_USER_HOME_DIR}/.local/" -type d -name '__pycache__'
-print0 | xargs -0 rm -rf || true ; \
# make sure that all directories and files in .local are also group
accessible
find "${AIRFLOW_USER_HOME_DIR}/.local" -executable -print0 | xargs
--null chmod g+x; \
find "${AIRFLOW_USER_HOME_DIR}/.local" -print0 | xargs --null chmod g+rw
```
* Installing from requirements.txt placed in docker-context-files
```
RUN if [[ -f /docker-context-files/requirements.txt ]]; then \
pip install --no-cache-dir --user -r
/docker-context-files/requirements.txt; \
fi
```
So any "local" caching of python installation should be done here @V0lantis
- if you want to pursue it and attempt to do it in the way that we avoid the
potential cache invalidation issues I mentioned (we could also add apt-caching
as suggested by @hterik - those adding/iterating with new `apt` dependencies
is less likely.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]