This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git
The following commit(s) were added to refs/heads/main by this push:
new ac9b224e62 Speed up Breeze experience on Mac OS (#23866)
ac9b224e62 is described below
commit ac9b224e629918c66e867cde8f94804b6336912c
Author: Jarek Potiuk <[email protected]>
AuthorDate: Tue May 24 10:11:42 2022 +0100
Speed up Breeze experience on Mac OS (#23866)
This change should significantly speed up Breeze experience (and
especially iterating over a change in Breeze for MacOS users -
independently if you are using x86 or arm architecture.
The problem with MacOS with docker is particularly slow filesystem
used to map sources from Host to Docker VM. It is particularly bad
when there are multiple small files involved.
The improvement come from two areas:
* removing duplicate pycache cleaning
* moving MyPy cache to docker volume
When entering breeze we are - just in case - cleaning .pyc and
__pychache__ files potentially generated outside of the docker
container - this is particularly useful if you use local IDE
and you do not have bytecode generation disabled (we have it
disabled in Breeze). Generating python bytecode might lead to
various problems when you are switching branches and Python
versions, so for Breeze development where the files change
often anyway, disabling them and removing when they are found
is important. This happens at entering breeze and it might take
a second or two depending if you have locally generated.
It could happen that __init script was called twice (depending which
script was called - therefore the time could be double the one
that was actually needed. Also if you ever generated provider
packages, the time could be much longer, because node_modules
generated in provider sources were not excluded from searching
(and on MacOS it takes a LOT of time).
This also led to duplicate time of exit as the initialization code
installed traps that were also run twice. The traps however were
rather fast so had no negative influence on performance.
The change adds a guard so that initialization is only ever executed
once.
Second part of the change is moving the cache of mypy to a docker
volume rather than being used from local source folder (default
when complete sources are mounted). We were already using selective
mount to make sure MacOS filesystem slowness affects us in minimal
way - but with this change, the cache will be stored in docker
volume that does not suffer from the same problems as mounting
volumes from host. The Docker volume is preserved until the
`docker stop` command is run - which means that iterating over
a change should be WAY faster now - observed speed-up were around
5x speedups for MyPy pre-commit.
---
BREEZE.rst | 8 +++++++
.../airflow_breeze/commands/developer_commands.py | 2 +-
.../src/airflow_breeze/params/shell_params.py | 28 ++++++++++++----------
.../airflow_breeze/utils/docker_command_utils.py | 12 ++++++++++
dev/breeze/src/airflow_breeze/utils/path_utils.py | 10 +++++---
.../docker-compose/backend-mssql-docker-volume.yml | 2 ++
scripts/ci/docker-compose/backend-mysql.yml | 2 ++
scripts/ci/docker-compose/backend-postgres.yml | 2 ++
scripts/ci/docker-compose/backend-sqlite.yml | 2 ++
scripts/ci/docker-compose/base.yml | 5 ----
scripts/ci/docker-compose/local.yml | 3 +++
scripts/ci/libraries/_initialization.sh | 1 -
scripts/ci/pre_commit/pre_commit_flake8.py | 7 ++++--
scripts/ci/pre_commit/pre_commit_mypy.py | 7 ++++--
scripts/in_container/_in_container_script_init.sh | 18 +++++++-------
scripts/in_container/_in_container_utils.sh | 12 ++++++++++
16 files changed, 86 insertions(+), 35 deletions(-)
diff --git a/BREEZE.rst b/BREEZE.rst
index aca0378f71..2facbd3662 100644
--- a/BREEZE.rst
+++ b/BREEZE.rst
@@ -258,6 +258,8 @@ If you have several checked out Airflow sources, Breeze
will warn you if you are
source tree and will offer you to re-install from those sources - to make sure
that you are using the right
version.
+You can skip Breeze's upgrade check by setting ``SKIP_BREEZE_UPGRADE_CHECK``
variable to non empty value.
+
By default Breeze works on the version of Airflow that you run it in - in case
you are outside of the
sources of Airflow and you installed Breeze from a directory - Breeze will be
run on Airflow sources from
where it was installed.
@@ -1052,6 +1054,12 @@ command but it is very similar to current ``breeze``
command):
</a>
</div>
+.. note::
+
+ When you run static checks, some of the artifacts (mypy_cache) is stored
in docker-compose volume
+ so that it can speed up static checks execution significantly. However,
sometimes, the cache might
+ get broken, in which case you should run ``breeze stop`` to clean up the
cache.
+
Building the Documentation
--------------------------
diff --git a/dev/breeze/src/airflow_breeze/commands/developer_commands.py
b/dev/breeze/src/airflow_breeze/commands/developer_commands.py
index 1cc9fba59c..30c02688e0 100644
--- a/dev/breeze/src/airflow_breeze/commands/developer_commands.py
+++ b/dev/breeze/src/airflow_breeze/commands/developer_commands.py
@@ -523,7 +523,7 @@ def stop(verbose: bool, dry_run: bool, preserve_volumes:
bool):
command_to_execute = ['docker-compose', 'down', "--remove-orphans"]
if not preserve_volumes:
command_to_execute.append("--volumes")
- shell_params = ShellParams(verbose=verbose)
+ shell_params = ShellParams(verbose=verbose, backend="all")
env_variables = get_env_variables_for_docker_commands(shell_params)
run_command(command_to_execute, verbose=verbose, dry_run=dry_run,
env=env_variables)
diff --git a/dev/breeze/src/airflow_breeze/params/shell_params.py
b/dev/breeze/src/airflow_breeze/params/shell_params.py
index 560bbb97c6..58107b9e27 100644
--- a/dev/breeze/src/airflow_breeze/params/shell_params.py
+++ b/dev/breeze/src/airflow_breeze/params/shell_params.py
@@ -165,19 +165,26 @@ class ShellParams:
get_console().print(f'[info]Backend: {self.backend}
{self.backend_version}[/]')
get_console().print(f'[info]Airflow used at runtime:
{self.use_airflow_version}[/]')
+ def get_backend_compose_files(self, backend: str):
+ if backend == "mssql":
+ backend_docker_compose_file = (
+
f"{str(SCRIPTS_CI_DIR)}/docker-compose/backend-{backend}-{self.debian_version}.yml"
+ )
+ else:
+ backend_docker_compose_file =
f"{str(SCRIPTS_CI_DIR)}/docker-compose/backend-{backend}.yml"
+ backend_port_docker_compose_file =
f"{str(SCRIPTS_CI_DIR)}/docker-compose/backend-{backend}-port.yml"
+ return backend_docker_compose_file, backend_port_docker_compose_file
+
@property
def compose_files(self):
compose_ci_file = []
main_ci_docker_compose_file =
f"{str(SCRIPTS_CI_DIR)}/docker-compose/base.yml"
- if self.backend == "mssql":
- backend_docker_compose_file = (
-
f"{str(SCRIPTS_CI_DIR)}/docker-compose/backend-{self.backend}-{self.debian_version}.yml"
- )
+ if self.backend != "all":
+ backend_files = self.get_backend_compose_files(self.backend)
else:
- backend_docker_compose_file =
f"{str(SCRIPTS_CI_DIR)}/docker-compose/backend-{self.backend}.yml"
- backend_port_docker_compose_file = (
-
f"{str(SCRIPTS_CI_DIR)}/docker-compose/backend-{self.backend}-port.yml"
- )
+ backend_files = []
+ for backend in ALLOWED_BACKENDS:
+ backend_files.extend(self.get_backend_compose_files(backend))
local_docker_compose_file =
f"{str(SCRIPTS_CI_DIR)}/docker-compose/local.yml"
local_all_sources_docker_compose_file =
f"{str(SCRIPTS_CI_DIR)}/docker-compose/local-all-sources.yml"
files_docker_compose_file =
f"{str(SCRIPTS_CI_DIR)}/docker-compose/files.yml"
@@ -194,9 +201,7 @@ class ShellParams:
compose_ci_file.append(
f"{str(SCRIPTS_CI_DIR)}/docker-compose/backend-mssql-docker-volume.yml"
)
- compose_ci_file.extend(
- [main_ci_docker_compose_file, backend_docker_compose_file,
files_docker_compose_file]
- )
+ compose_ci_file.extend([main_ci_docker_compose_file, *backend_files,
files_docker_compose_file])
if self.mount_sources == MOUNT_SELECTED:
compose_ci_file.extend([local_docker_compose_file])
@@ -204,7 +209,6 @@ class ShellParams:
compose_ci_file.extend([local_all_sources_docker_compose_file])
else: # none
compose_ci_file.extend([remove_sources_docker_compose_file])
- compose_ci_file.extend([backend_port_docker_compose_file])
if self.forward_credentials:
compose_ci_file.append(forward_credentials_docker_compose_file)
if self.use_airflow_version is not None:
diff --git a/dev/breeze/src/airflow_breeze/utils/docker_command_utils.py
b/dev/breeze/src/airflow_breeze/utils/docker_command_utils.py
index d3d40bc607..784290294b 100644
--- a/dev/breeze/src/airflow_breeze/utils/docker_command_utils.py
+++ b/dev/breeze/src/airflow_breeze/utils/docker_command_utils.py
@@ -17,6 +17,7 @@
"""Various utils to prepare docker and docker compose commands."""
import os
import re
+import subprocess
import sys
from copy import deepcopy
from random import randint
@@ -97,6 +98,16 @@ NECESSARY_HOST_VOLUMES = [
]
+def create_volume_if_missing(volume_name: str):
+ res_inspect = run_command(cmd=["docker", "inspect", volume_name],
stdout=subprocess.DEVNULL, check=False)
+ if res_inspect.returncode != 0:
+ run_command(cmd=["docker", "volume", "create", volume_name],
check=True)
+
+
+def create_static_check_volumes():
+ create_volume_if_missing("docker-compose_mypy-cache-volume")
+
+
def get_extra_docker_flags(mount_sources: str) -> List[str]:
"""
Returns extra docker flags based on the type of mounting we want to do for
sources.
@@ -110,6 +121,7 @@ def get_extra_docker_flags(mount_sources: str) -> List[str]:
elif mount_sources == MOUNT_SELECTED:
for flag in NECESSARY_HOST_VOLUMES:
extra_docker_flags.extend(["-v", str(AIRFLOW_SOURCES_ROOT) + flag])
+ extra_docker_flags.extend(['-v',
"docker-compose_mypy-cache-volume:/opt/airflow/.mypy_cache/"])
else: # none
extra_docker_flags.extend(["-v", f"{AIRFLOW_SOURCES_ROOT /
'empty'}:/opt/airflow/airflow"])
extra_docker_flags.extend(["-v", f"{AIRFLOW_SOURCES_ROOT}/files:/files"])
diff --git a/dev/breeze/src/airflow_breeze/utils/path_utils.py
b/dev/breeze/src/airflow_breeze/utils/path_utils.py
index beb2926302..474891e98a 100644
--- a/dev/breeze/src/airflow_breeze/utils/path_utils.py
+++ b/dev/breeze/src/airflow_breeze/utils/path_utils.py
@@ -62,7 +62,13 @@ def in_help() -> bool:
def skip_upgrade_check():
- return in_self_upgrade() or in_autocomplete() or in_help() or hasattr(sys,
'_called_from_test')
+ return (
+ in_self_upgrade()
+ or in_autocomplete()
+ or in_help()
+ or hasattr(sys, '_called_from_test')
+ or os.environ.get('SKIP_BREEZE_UPGRADE_CHECK')
+ )
def get_package_setup_metadata_hash() -> str:
@@ -235,7 +241,6 @@ AIRFLOW_SOURCES_ROOT =
find_airflow_sources_root_to_operate_on()
BUILD_CACHE_DIR = AIRFLOW_SOURCES_ROOT / '.build'
FILES_DIR = AIRFLOW_SOURCES_ROOT / 'files'
MSSQL_DATA_VOLUME = AIRFLOW_SOURCES_ROOT / 'tmp_mssql_volume'
-MYPY_CACHE_DIR = AIRFLOW_SOURCES_ROOT / '.mypy_cache'
LOGS_DIR = AIRFLOW_SOURCES_ROOT / 'logs'
DIST_DIR = AIRFLOW_SOURCES_ROOT / 'dist'
SCRIPTS_CI_DIR = AIRFLOW_SOURCES_ROOT / 'scripts' / 'ci'
@@ -253,7 +258,6 @@ def create_directories() -> None:
BUILD_CACHE_DIR.mkdir(parents=True, exist_ok=True)
FILES_DIR.mkdir(parents=True, exist_ok=True)
MSSQL_DATA_VOLUME.mkdir(parents=True, exist_ok=True)
- MYPY_CACHE_DIR.mkdir(parents=True, exist_ok=True)
LOGS_DIR.mkdir(parents=True, exist_ok=True)
DIST_DIR.mkdir(parents=True, exist_ok=True)
OUTPUT_LOG.mkdir(parents=True, exist_ok=True)
diff --git a/scripts/ci/docker-compose/backend-mssql-docker-volume.yml
b/scripts/ci/docker-compose/backend-mssql-docker-volume.yml
index ca5d53fead..f18d6086c3 100644
--- a/scripts/ci/docker-compose/backend-mssql-docker-volume.yml
+++ b/scripts/ci/docker-compose/backend-mssql-docker-volume.yml
@@ -20,3 +20,5 @@ services:
mssql:
volumes:
- mssql-db-volume:/var/opt/mssql
+volumes:
+ mssql-db-volume:
diff --git a/scripts/ci/docker-compose/backend-mysql.yml
b/scripts/ci/docker-compose/backend-mysql.yml
index aaef6b8c57..990633a557 100644
--- a/scripts/ci/docker-compose/backend-mysql.yml
+++ b/scripts/ci/docker-compose/backend-mysql.yml
@@ -44,3 +44,5 @@ services:
restart: always
command: ['mysqld', '--character-set-server=utf8mb4',
'--collation-server=utf8mb4_unicode_ci']
+volumes:
+ mysql-db-volume:
diff --git a/scripts/ci/docker-compose/backend-postgres.yml
b/scripts/ci/docker-compose/backend-postgres.yml
index 4f7374768f..6b1c92ac6d 100644
--- a/scripts/ci/docker-compose/backend-postgres.yml
+++ b/scripts/ci/docker-compose/backend-postgres.yml
@@ -42,3 +42,5 @@ services:
timeout: 10s
retries: 5
restart: always
+volumes:
+ postgres-db-volume:
diff --git a/scripts/ci/docker-compose/backend-sqlite.yml
b/scripts/ci/docker-compose/backend-sqlite.yml
index 947023be46..2a9e895ec7 100644
--- a/scripts/ci/docker-compose/backend-sqlite.yml
+++ b/scripts/ci/docker-compose/backend-sqlite.yml
@@ -25,3 +25,5 @@ services:
volumes:
- /dev/urandom:/dev/random # Required to get non-blocking entropy
source
- sqlite-db-volume:/root/airflow
+volumes:
+ sqlite-db-volume:
diff --git a/scripts/ci/docker-compose/base.yml
b/scripts/ci/docker-compose/base.yml
index 616c7ee495..48e4d3df96 100644
--- a/scripts/ci/docker-compose/base.yml
+++ b/scripts/ci/docker-compose/base.yml
@@ -90,8 +90,3 @@ services:
- "${FLOWER_HOST_PORT}:5555"
cap_add:
- SYS_PTRACE
-volumes:
- sqlite-db-volume:
- postgres-db-volume:
- mysql-db-volume:
- mssql-db-volume:
diff --git a/scripts/ci/docker-compose/local.yml
b/scripts/ci/docker-compose/local.yml
index 61b8eac06a..0e37d0c34a 100644
--- a/scripts/ci/docker-compose/local.yml
+++ b/scripts/ci/docker-compose/local.yml
@@ -26,6 +26,7 @@ services:
# or those that might be useful to see in the host as output of the
# tests (such as logs)
volumes:
+ - mypy-cache-volume:/opt/airflow/.mypy_cache/
# START automatically generated volumes from LOCAL_MOUNTS in
_local_mounts.sh
- ../../../.bash_aliases:/root/.bash_aliases:cached
- ../../../.bash_history:/root/.bash_history:cached
@@ -58,3 +59,5 @@ services:
- ../../../chart:/opt/airflow/chart:cached
- ../../../metastore_browser:/opt/airflow/metastore_browser:cached
# END automatically generated volumes from LOCAL_MOUNTS in
_local_mounts.sh
+volumes:
+ mypy-cache-volume:
diff --git a/scripts/ci/libraries/_initialization.sh
b/scripts/ci/libraries/_initialization.sh
index b1b5541077..3ffdf44e92 100644
--- a/scripts/ci/libraries/_initialization.sh
+++ b/scripts/ci/libraries/_initialization.sh
@@ -64,7 +64,6 @@ function initialization::create_directories() {
export CI="${CI="false"}"
# Create useful directories if not yet created
- mkdir -p "${AIRFLOW_SOURCES}/.mypy_cache"
mkdir -p "${AIRFLOW_SOURCES}/logs"
mkdir -p "${AIRFLOW_SOURCES}/dist"
diff --git a/scripts/ci/pre_commit/pre_commit_flake8.py
b/scripts/ci/pre_commit/pre_commit_flake8.py
index 6b46333403..3d4a56060f 100755
--- a/scripts/ci/pre_commit/pre_commit_flake8.py
+++ b/scripts/ci/pre_commit/pre_commit_flake8.py
@@ -32,8 +32,11 @@ AIRFLOW_SOURCES = Path(__file__).parents[3].resolve()
GITHUB_REPOSITORY = os.environ.get('GITHUB_REPOSITORY', "apache/airflow")
if __name__ == '__main__':
+ os.environ['SKIP_BREEZE_UPGRADE_CHECK'] = "true"
sys.path.insert(0, str(Path(__file__).parents[3].resolve() / "dev" /
"breeze" / "src"))
from airflow_breeze.branch_defaults import AIRFLOW_BRANCH
+ from airflow_breeze.global_constants import MOUNT_SELECTED
+ from airflow_breeze.utils.docker_command_utils import
create_static_check_volumes, get_extra_docker_flags
AIRFLOW_CI_IMAGE =
f"ghcr.io/{GITHUB_REPOSITORY}/{AIRFLOW_BRANCH}/ci/python3.7"
@@ -41,13 +44,13 @@ if __name__ == '__main__':
print(f'[red]The image {AIRFLOW_CI_IMAGE} is not available.[/]\n')
print("\n[yellow]Please run at the earliest convenience:[/]\n\nbreeze
build-image --python 3.7\n\n")
sys.exit(1)
+ create_static_check_volumes()
return_code = subprocess.call(
args=[
"docker",
"run",
"-t",
- "-v",
- f"{AIRFLOW_SOURCES}:/opt/airflow/",
+ *get_extra_docker_flags(MOUNT_SELECTED),
"-e",
"SKIP_ENVIRONMENT_INITIALIZATION=true",
"-e",
diff --git a/scripts/ci/pre_commit/pre_commit_mypy.py
b/scripts/ci/pre_commit/pre_commit_mypy.py
index 1b78be3c9e..0075856325 100755
--- a/scripts/ci/pre_commit/pre_commit_mypy.py
+++ b/scripts/ci/pre_commit/pre_commit_mypy.py
@@ -33,8 +33,11 @@ AIRFLOW_SOURCES = Path(__file__).parents[3].resolve()
GITHUB_REPOSITORY = os.environ.get('GITHUB_REPOSITORY', "apache/airflow")
if __name__ == '__main__':
+ os.environ['SKIP_BREEZE_UPGRADE_CHECK'] = "true"
sys.path.insert(0, str(Path(__file__).parents[3].resolve() / "dev" /
"breeze" / "src"))
from airflow_breeze.branch_defaults import AIRFLOW_BRANCH
+ from airflow_breeze.global_constants import MOUNT_SELECTED
+ from airflow_breeze.utils.docker_command_utils import
create_static_check_volumes, get_extra_docker_flags
AIRFLOW_CI_IMAGE =
f"ghcr.io/{GITHUB_REPOSITORY}/{AIRFLOW_BRANCH}/ci/python3.7"
@@ -42,13 +45,13 @@ if __name__ == '__main__':
print(f'[red]The image {AIRFLOW_CI_IMAGE} is not available.[/]\n')
print("\n[yellow]Please run at the earliest convenience:[/]\n\nbreeze
build-image --python 3.7\n\n")
sys.exit(1)
+ create_static_check_volumes()
return_code = subprocess.call(
args=[
"docker",
"run",
"-t",
- "-v",
- f"{AIRFLOW_SOURCES}:/opt/airflow/",
+ *get_extra_docker_flags(MOUNT_SELECTED),
"-e",
"SKIP_ENVIRONMENT_INITIALIZATION=true",
"-e",
diff --git a/scripts/in_container/_in_container_script_init.sh
b/scripts/in_container/_in_container_script_init.sh
index 562de97da2..bb090a5900 100755
--- a/scripts/in_container/_in_container_script_init.sh
+++ b/scripts/in_container/_in_container_script_init.sh
@@ -23,13 +23,13 @@ IN_CONTAINER_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )"
&& pwd )"
# shellcheck source=scripts/in_container/_in_container_utils.sh
. "${IN_CONTAINER_DIR}/_in_container_utils.sh"
+if [[ ${IN_CONTAINER_INITIALIZED=} != "true" ]]; then
+ in_container_set_colors
+ in_container_basic_sanity_check
+ in_container_script_start
-in_container_set_colors
-
-in_container_basic_sanity_check
-
-in_container_script_start
-
-add_trap "in_container_fix_ownership" EXIT HUP INT TERM
-add_trap "in_container_clear_tmp" EXIT HUP INT TERM
-add_trap "in_container_script_end" EXIT HUP INT TERM
+ add_trap "in_container_fix_ownership" EXIT HUP INT TERM
+ add_trap "in_container_clear_tmp" EXIT HUP INT TERM
+ add_trap "in_container_script_end" EXIT HUP INT TERM
+ export IN_CONTAINER_INITIALIZED="true"
+fi
diff --git a/scripts/in_container/_in_container_utils.sh
b/scripts/in_container/_in_container_utils.sh
index 10115a7009..c41e7af548 100644
--- a/scripts/in_container/_in_container_utils.sh
+++ b/scripts/in_container/_in_container_utils.sh
@@ -80,14 +80,20 @@ function in_container_script_end() {
#
function in_container_cleanup_pyc() {
set +o pipefail
+ if [[ ${CLEANED_PYC=} == "true" ]]; then
+ return
+ fi
sudo find . \
-path "./airflow/www/node_modules" -prune -o \
-path "./airflow/ui/node_modules" -prune -o \
+ -path "./provider_packages/airflow/www/node_modules" -prune -o \
+ -path "./provider_packages/airflow/ui/node_modules" -prune -o \
-path "./.eggs" -prune -o \
-path "./docs/_build" -prune -o \
-path "./build" -prune -o \
-name "*.pyc" | grep ".pyc$" | sudo xargs rm -f
set -o pipefail
+ export CLEANED_PYC="true"
}
#
@@ -95,14 +101,20 @@ function in_container_cleanup_pyc() {
#
function in_container_cleanup_pycache() {
set +o pipefail
+ if [[ ${CLEANED_PYCACHE=} == "true" ]]; then
+ return
+ fi
find . \
-path "./airflow/www/node_modules" -prune -o \
-path "./airflow/ui/node_modules" -prune -o \
+ -path "./provider_packages/airflow/www/node_modules" -prune -o \
+ -path "./provider_packages/airflow/ui/node_modules" -prune -o \
-path "./.eggs" -prune -o \
-path "./docs/_build" -prune -o \
-path "./build" -prune -o \
-name "__pycache__" | grep "__pycache__" | sudo xargs rm -rf
set -o pipefail
+ export CLEANED_PYCACHE="true"
}
#