potiuk opened a new pull request, #23866: URL: https://github.com/apache/airflow/pull/23866
This change should significantly speed up Breeze experience (and especially iterating over a change in Breeze for MacOS users - independently if you are using x86 or arm architecture. The problem with MacOS with docker is particularly slow filesystem used to map sources from Host to Docker VM. It is particularly bad when there are multiple small files involved. The improvement come from two areas: * removing duplicate pycache cleaning * moving MyPy cache to docker volume When entering breeze we are - just in case - cleaning .pyc and __pychache__ files potentially generated outside of the docker container - this is particularly useful if you use local IDE and you do not have bytecode generation disabled (we have it disabled in Breeze). Generating python bytecode might lead to various problems when you are switching branches and Python versions, so for Breeze development where the files change often anyway, disabling them and removing when they are found is important. This happens at entering breeze and it might take a second or two depending if you have locally generated. It could happen that __init script was called twice (depending which script was called - therefore the time could be double the one that was actually needed. Also if you ever generated provider packages, the time could be much longer, because node_modules generated in provider sources were not excluded from searching (and on MacOS it takes a LOT of time). This also led to duplicate time of exit as the initialization code installed traps that were also run twice. The traps however were rather fast so had no negative influence on performance. The change adds a guard so that initialization is only ever executed once. Second part of the change is moving the cache of mypy to a docker volume rather than being used from local source folder (default when complete sources are mounted). We were already using selective mount to make sure MacOS filesystem slowness affects us in minimal way - but with this change, the cache will be stored in docker volume that does not suffer from the same problems as mounting volumes from host. The Docker volume is preserved until the `docker stop` command is run - which means that iterating over a change should be WAY faster now - observed speed-up were around 5x speedups for MyPy pre-commit. <!-- Thank you for contributing! Please make sure that your code changes are covered with tests. And in case of new features or big changes remember to adjust the documentation. Feel free to ping committers for the review! In case of existing issue, reference it using one of the following: closes: #ISSUE related: #ISSUE How to write a good git commit message: http://chris.beams.io/posts/git-commit/ --> --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragement file, named `{pr_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
